stackOverflowed

Posted on May 22

Stop Using Data Loader for Backfills: A Guide to Parameterized Batch Apex

#salesforce #apex #programming #tutorial

Every Salesforce team hits the same wall eventually. A business stakeholder walks over and says: "We need to update the Region__c field on 2.3 million Account records based on a new territory model." Someone opens Data Loader, exports a CSV, runs a VLOOKUP in Excel, and re-imports. It takes a full afternoon. Two weeks later, the territory model changes again.

There's a better way. Batch Apex lets you encode the logic of that update — not just the data — so it's repeatable, testable, auditable, and parameterized for the next time it happens. This article covers how to design batch classes that are genuinely reusable across backfill scenarios, the trade-offs you need to understand before choosing this approach, and how to monitor jobs once they're running.

The Anatomy of a Batch Class (Quick Refresher)

If you've written Batch Apex before, skim this section. If not, here's the contract you're implementing:

public class AccountRegionBackfill implements Database.Batchable<SObject> {

    // 1. QUERY — what records are we operating on?
    public Database.QueryLocator start(Database.BatchableContext bc) {
        return Database.getQueryLocator(
            'SELECT Id, BillingState FROM Account WHERE Region__c = null'
        );
    }

    // 2. PROCESS — what do we do to each chunk?
    public void execute(Database.BatchableContext bc, List<Account> scope) {
        for (Account acc : scope) {
            acc.Region__c = TerritoryUtil.deriveRegion(acc.BillingState);
        }
        update scope;
    }

    // 3. CLEANUP — what happens when we're done?
    public void finish(Database.BatchableContext bc) {
        AsyncApexJob job = [
            SELECT TotalJobItems, JobItemsProcessed, NumberOfErrors
            FROM AsyncApexJob WHERE Id = :bc.getJobId()
        ];
        System.debug('Completed: ' + job.JobItemsProcessed + '/' +
                      job.TotalJobItems + ' batches, ' +
                      job.NumberOfErrors + ' errors');
    }
}

You invoke it from Anonymous Apex, a trigger, or a scheduled class:

Database.executeBatch(new AccountRegionBackfill(), 200);

The 200 is the batch size — the number of records passed to each execute call. Salesforce then breaks your query result into chunks, runs execute for each chunk in its own transaction with fresh governor limits, and calls finish once everything's done.

That's the textbook version. Let's talk about what the textbook leaves out.

Why Batch Apex Beats Data Loader for Backfills

Data Loader is a fine tool for one-off imports. But for data updates that involve logic, it introduces problems that compound over time.

Data Loader requires you to externalize your logic. If the update rule is "set Region__c based on BillingState, but only for Accounts created after 2020 that aren't owned by the integration user," you have to express that in an Excel formula, a Python script, or your own head. None of those live in your Salesforce org. None of them are version-controlled by default. None of them fire when requirements change.

Batch Apex keeps the logic where the data is. The derivation logic, the filter criteria, and the error handling all live in the same class, deployed through the same CI pipeline as the rest of your code. When the territory model changes, you update one class — you don't re-export 2.3 million records and pray the VLOOKUP columns still line up.

Here's a practical comparison for the scenarios that matter most:

Scenario	Data Loader	Batch Apex
Runtime business logic (API callouts, formula calculations)	❌ Can't execute Apex mid-update	✅ Full Apex context per chunk
Trigger/flow control	⚠️ Always fires everything	✅ Bypass flags, conditional logic
Repeatability	Re-export → re-filter → re-import	Change one parameter, re-run
Auditability	Logs on someone's laptop	`AsyncApexJob` record in the org
Version control	Excel file somewhere on Slack	Deployed via CI with your codebase

There are scenarios where Data Loader is the right call: simple one-time imports of static CSVs, sandbox data seeding, or situations where a non-developer needs to do a quick update. But if you're doing the same type of update more than twice, Batch Apex pays for itself.

Parameterizing for Reuse

The mistake most teams make is building one batch class per backfill task. You end up with AccountRegionBackfill, AccountIndustryBackfill, ContactEmailNormalizeBatch, and thirty other classes that all follow the same pattern: query records matching a filter, apply a transformation, update.

Instead, build one parameterized class that accepts its configuration at construction time.

Pattern 1: Constructor-Injected Parameters

The simplest approach. Pass the query and the field-value map into the constructor:

public class GenericFieldUpdateBatch implements Database.Batchable<SObject> {

    private String query;
    private Map<String, Object> fieldValues;
    private Boolean allOrNone;

    public GenericFieldUpdateBatch(String query, Map<String, Object> fieldValues) {
        this(query, fieldValues, false);
    }

    public GenericFieldUpdateBatch(
        String query,
        Map<String, Object> fieldValues,
        Boolean allOrNone
    ) {
        this.query = query;
        this.fieldValues = fieldValues;
        this.allOrNone = allOrNone;
    }

    public Database.QueryLocator start(Database.BatchableContext bc) {
        return Database.getQueryLocator(query);
    }

    public void execute(Database.BatchableContext bc, List<SObject> scope) {
        for (SObject record : scope) {
            for (String field : fieldValues.keySet()) {
                record.put(field, fieldValues.get(field));
            }
        }
        Database.update(scope, allOrNone);
    }

    public void finish(Database.BatchableContext bc) {
        // Send notification, chain next batch, etc.
    }
}

Invocation becomes declarative:

// Backfill Region for all null-region Accounts in California
Database.executeBatch(
    new GenericFieldUpdateBatch(
        'SELECT Id FROM Account WHERE Region__c = null AND BillingState = \'CA\'',
        new Map<String, Object>{ 'Region__c' => 'West' }
    ),
    200
);

// Normalize a status field on Opportunities
Database.executeBatch(
    new GenericFieldUpdateBatch(
        'SELECT Id FROM Opportunity WHERE StageName = \'Closed/Won\'',
        new Map<String, Object>{ 'StageName' => 'Closed Won' }
    ),
    200
);

This single class replaces dozens of one-off batch classes for simple field-value updates.

Pattern 2: Strategy Interface for Complex Logic

When the transformation isn't a static value but requires computation, inject a strategy:

public interface IBatchTransform {
    void apply(List<SObject> records);
}

public class TransformableBatch implements Database.Batchable<SObject> {

    private String query;
    private IBatchTransform transformer;

    public TransformableBatch(String query, IBatchTransform transformer) {
        this.query = query;
        this.transformer = transformer;
    }

    public Database.QueryLocator start(Database.BatchableContext bc) {
        return Database.getQueryLocator(query);
    }

    public void execute(Database.BatchableContext bc, List<SObject> scope) {
        transformer.apply(scope);
        Database.update(scope, false);
    }

    public void finish(Database.BatchableContext bc) { }
}

Now a territory-mapping backfill looks like this:

public class TerritoryTransform implements IBatchTransform {
    public void apply(List<SObject> records) {
        for (SObject rec : records) {
            Account acc = (Account) rec;
            acc.Region__c = TerritoryUtil.deriveRegion(acc.BillingState);
        }
    }
}

// Run it
Database.executeBatch(
    new TransformableBatch(
        'SELECT Id, BillingState FROM Account WHERE Region__c = null',
        new TerritoryTransform()
    ),
    200
);

The batch infrastructure and the business logic are fully separated. You can unit-test TerritoryTransform without ever invoking the batch framework.

Pattern 3: Custom Metadata–Driven Configuration

For teams that want admins (not just developers) to control batch behavior, store your parameters in Custom Metadata:

Batch_Job_Config__mdt
├── DeveloperName:        Account_Region_Backfill
├── SOQL_Query__c:        SELECT Id, BillingState FROM Account WHERE Region__c = null
├── Batch_Size__c:        200
├── Active__c:            true
└── Notification_Email__c: admin@company.com

Your batch class reads from this at runtime. Combined with a Scheduled Apex wrapper, you can toggle jobs on and off without a deployment.

Managing State and Error Handling

Tracking Errors Across Batches

By default, Batch Apex is stateless — instance variables reset between execute calls. If you need to accumulate errors or counts across the entire job, implement Database.Stateful:

public class AccountBackfillWithTracking
    implements Database.Batchable<SObject>, Database.Stateful {

    private Integer successCount = 0;
    private Integer errorCount = 0;
    private List<String> errorMessages = new List<String>();

    public Database.QueryLocator start(Database.BatchableContext bc) {
        return Database.getQueryLocator(
            'SELECT Id, BillingState FROM Account WHERE Region__c = null'
        );
    }

    public void execute(Database.BatchableContext bc, List<Account> scope) {
        for (Account acc : scope) {
            acc.Region__c = TerritoryUtil.deriveRegion(acc.BillingState);
        }
        List<Database.SaveResult> results = Database.update(scope, false);

        for (Integer i = 0; i < results.size(); i++) {
            if (results[i].isSuccess()) {
                successCount++;
            } else {
                errorCount++;
                for (Database.Error err : results[i].getErrors()) {
                    errorMessages.add(
                        scope[i].Id + ': ' + err.getMessage()
                    );
                }
            }
        }
    }

    public void finish(Database.BatchableContext bc) {
        Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
        mail.setToAddresses(new List<String>{ 'admin@company.com' });
        mail.setSubject('Backfill Complete: ' + successCount + ' updated, '
                        + errorCount + ' errors');
        mail.setPlainTextBody(
            errorMessages.isEmpty()
                ? 'All records updated successfully.'
                : 'Errors:\n' + String.join(errorMessages, '\n')
        );
        Messaging.sendEmail(new List<Messaging.SingleEmailMessage>{ mail });
    }
}

The trade-off with Database.Stateful: Salesforce serializes and deserializes your instance variables between every batch execution. If you're accumulating a list of 50,000 error messages, you'll hit heap limits. Keep your state lean — store counts and a capped list of errors, not the full record set.

Partial Success with `Database.update(records, false)`

This is critical for backfills. The false parameter means "don't roll back the whole batch if one record fails." Without it, a single validation rule failure on record #47 kills the entire chunk of 200 records. With it, 199 succeed and you log the one failure.

Always use allOrNone = false for backfills unless you have a specific reason to require atomicity within a chunk.

Batch Size: The Trade-Off Nobody Explains Well

The batch size parameter (the second argument to Database.executeBatch) controls how many records land in each execute call. The default is 200. The maximum is 2,000.

Here's the decision framework:

Smaller batches (50–100) when your execute method does heavy work per record — callouts, complex cross-object queries, CPU-intensive calculations. Each execute gets its own governor limits (100 SOQL queries, 150 DML statements, 10-second CPU time). Fewer records per batch means you're less likely to hit those limits.

Larger batches (200–2,000) when your execute method is simple — setting a field value, light computation. More records per batch means fewer total transactions, which means the job finishes faster and you consume fewer of your org's daily async execution quota (250,000 batch executions per 24 hours).

If you're making callouts (Database.AllowsCallouts), the max batch size is effectively limited by the 100-callout-per-transaction limit. If each record needs one callout, your batch size can't exceed 100.

💡 A common mistake is setting the batch size to 2,000 without testing it first. Start at 200, monitor for governor limit errors, and adjust up or down based on what the AsyncApexJob record tells you.

Monitoring Your Jobs

In the UI

Navigate to Setup → Apex Jobs to see all running, queued, and completed batch jobs. Each row shows the class name, status (Queued, Processing, Completed, Failed, Aborted), and the number of batches processed versus total.

The Apex Flex Queue (Setup → Apex Flex Queue) shows jobs waiting to start. Salesforce allows 5 concurrent batch jobs; the rest queue here (up to 100). You can drag to reorder priority.

Programmatically via SOQL

The AsyncApexJob object is your best friend for building dashboards, sending alerts, or chaining jobs:

AsyncApexJob job = [
    SELECT Id, Status, JobItemsProcessed, TotalJobItems,
           NumberOfErrors, CreatedDate, CompletedDate,
           ExtendedStatus, ApexClass.Name
    FROM AsyncApexJob
    WHERE Id = :batchJobId
];

// Calculate progress
Decimal progress = (job.TotalJobItems == 0) ? 0 :
    (Decimal.valueOf(job.JobItemsProcessed) /
     Decimal.valueOf(job.TotalJobItems) * 100).setScale(1);

System.debug(job.ApexClass.Name + ': ' + progress + '% complete, '
             + job.NumberOfErrors + ' errors');

Building a Lightweight Monitoring Component

Create a simple LWC or Flow that queries AsyncApexJob and displays active jobs:

@AuraEnabled(cacheable=true)
public static List<AsyncApexJob> getActiveBatchJobs() {
    return [
        SELECT Id, ApexClass.Name, Status, JobItemsProcessed,
               TotalJobItems, NumberOfErrors, CreatedDate
        FROM AsyncApexJob
        WHERE JobType = 'BatchApex'
          AND Status IN ('Queued', 'Preparing', 'Processing', 'Holding')
        ORDER BY CreatedDate DESC
        LIMIT 20
    ];
}

Alerting on Failures

In your finish method, always check NumberOfErrors. If it's non-zero, send an email, post to a Slack webhook (via a Platform Event or a quick callout), or create a custom Batch_Job_Log__c record that an admin dashboard monitors.

Chaining Batches for Multi-Step Backfills

Sometimes a backfill isn't one query — it's a sequence. First update Accounts, then recalculate Opportunities, then refresh a rollup on a custom object. You chain batches from the finish method:

public void finish(Database.BatchableContext bc) {
    Database.executeBatch(new OpportunityRecalcBatch(), 200);
}

For more complex chains, use a dispatcher pattern:

public class BatchChain {

    private List<Database.Batchable<SObject>> steps;
    private Integer batchSize;

    public BatchChain(List<Database.Batchable<SObject>> steps, Integer batchSize) {
        this.steps = steps;
        this.batchSize = batchSize;
    }

    public void start() {
        if (!steps.isEmpty()) {
            Database.executeBatch(steps.remove(0), batchSize);
        }
    }
}

⚠️ Be careful here. Each batch in the chain consumes one of your 5 concurrent batch slots. If you chain 8 batches and other processes are also submitting batch jobs, you'll start seeing jobs stuck in the Flex Queue. For chains longer than 3–4 steps, consider using Queueable Apex instead, which has a higher concurrency ceiling.

The Trade-Offs You Should Know About

Batch Apex isn't always the right tool. Here's when it's not:

Small record counts (under 10,000). The overhead of the batch framework — job queuing, context serialization, async scheduling — can make a simple anonymous Apex script or even a Flow faster for small updates.
Real-time requirements. Batch jobs can sit in the queue for minutes (or longer during peak hours). If you need an update to happen within seconds, use Queueable Apex or a trigger.
Complex transaction boundaries. Each execute call is its own transaction. If your backfill requires that all 50,000 records either succeed together or fail together, Batch Apex can't give you that. (Though in practice, all-or-nothing at that scale is rarely the real requirement.)
Callout-heavy workloads. The 100-callout-per-transaction limit applies per execute call. If every record needs a callout, your effective batch size is ≤100, which means a million-record job takes 10,000+ transactions. At that point, evaluate whether a Mulesoft or Heroku integration might be more appropriate.

A Production-Ready Template

Here's the pattern I use on every project. It combines parameterization, error tracking, optional chaining, and notification:

public class ConfigurableBackfillBatch
    implements Database.Batchable<SObject>, Database.Stateful {

    private String query;
    private IBatchTransform transformer;
    private String notificationEmail;
    private Database.Batchable<SObject> nextBatch;
    private Integer nextBatchSize;

    private Integer totalProcessed = 0;
    private Integer totalErrors = 0;
    private List<String> sampleErrors = new List<String>();
    private static final Integer MAX_SAMPLE_ERRORS = 50;

    public ConfigurableBackfillBatch(
        String query,
        IBatchTransform transformer,
        String notificationEmail
    ) {
        this.query = query;
        this.transformer = transformer;
        this.notificationEmail = notificationEmail;
    }

    public ConfigurableBackfillBatch chainNext(
        Database.Batchable<SObject> nextBatch,
        Integer batchSize
    ) {
        this.nextBatch = nextBatch;
        this.nextBatchSize = batchSize;
        return this;
    }

    public Database.QueryLocator start(Database.BatchableContext bc) {
        return Database.getQueryLocator(query);
    }

    public void execute(Database.BatchableContext bc, List<SObject> scope) {
        transformer.apply(scope);
        List<Database.SaveResult> results = Database.update(scope, false);

        for (Integer i = 0; i < results.size(); i++) {
            if (results[i].isSuccess()) {
                totalProcessed++;
            } else {
                totalErrors++;
                if (sampleErrors.size() < MAX_SAMPLE_ERRORS) {
                    sampleErrors.add(
                        scope[i].Id + ': '
                        + results[i].getErrors()[0].getMessage()
                    );
                }
            }
        }
    }

    public void finish(Database.BatchableContext bc) {
        if (String.isNotBlank(notificationEmail)) {
            String body = 'Batch complete.\n'
                + 'Records updated: ' + totalProcessed + '\n'
                + 'Errors: ' + totalErrors + '\n';
            if (!sampleErrors.isEmpty()) {
                body += '\nSample errors:\n'
                    + String.join(sampleErrors, '\n');
            }
            Messaging.SingleEmailMessage mail =
                new Messaging.SingleEmailMessage();
            mail.setToAddresses(new List<String>{ notificationEmail });
            mail.setSubject('Backfill Job: ' + totalProcessed
                            + ' updated, ' + totalErrors + ' errors');
            mail.setPlainTextBody(body);
            Messaging.sendEmail(
                new List<Messaging.SingleEmailMessage>{ mail }
            );
        }

        if (nextBatch != null) {
            Database.executeBatch(nextBatch, nextBatchSize);
        }
    }
}

Invoke it:

Database.executeBatch(
    new ConfigurableBackfillBatch(
        'SELECT Id, BillingState FROM Account WHERE Region__c = null',
        new TerritoryTransform(),
        'admin@company.com'
    ).chainNext(new OpportunityRecalcBatch(), 200),
    200
);

One class. Reusable across any backfill. Error tracking built in. Chaining built in. Notifications built in.

Final Checklist Before You Run a Backfill

Before kicking off a batch job in production, walk through this list:

Test in a sandbox first. Use a representative data volume — not 50 records, but 50,000.
Check the Flex Queue. If there are already 5 active batch jobs, yours will queue. Timing matters.
Use Database.update(scope, false). Partial success is almost always what you want for backfills.
Cap your error logging. If you're using Database.Stateful, don't accumulate unbounded lists. Set a maximum.
Add a notification in finish. You will forget to check manually. Your finish method should tell you the job is done.
Document the job. A comment in the class header that says "Run this when: the territory model changes. Last run: 2026-03-15 by J. Smith" is worth more than you think.
Consider bypass flags. If your records have triggers or flows that shouldn't fire during a backfill, use a static variable or Custom Permission to bypass them.

Batch Apex isn't glamorous. It doesn't involve AI, it won't trend on Twitter, and nobody will write a blog post about how it changed their life. But it's the difference between an org where backfills are a fire drill and an org where they're a two-line script. Build the pattern once, parameterize it, and move on to the interesting problems.

DEV Community

Stop Using Data Loader for Backfills: A Guide to Parameterized Batch Apex

The Anatomy of a Batch Class (Quick Refresher)

Why Batch Apex Beats Data Loader for Backfills

Parameterizing for Reuse

Pattern 1: Constructor-Injected Parameters

Pattern 2: Strategy Interface for Complex Logic

Pattern 3: Custom Metadata–Driven Configuration

Managing State and Error Handling

Tracking Errors Across Batches

Partial Success with `Database.update(records, false)`

Batch Size: The Trade-Off Nobody Explains Well

Monitoring Your Jobs

In the UI

Programmatically via SOQL

Building a Lightweight Monitoring Component

Alerting on Failures

Chaining Batches for Multi-Step Backfills

The Trade-Offs You Should Know About

A Production-Ready Template

Final Checklist Before You Run a Backfill

Top comments (0)

The Anatomy of a Batch Class (Quick Refresher)

Why Batch Apex Beats Data Loader for Backfills

Parameterizing for Reuse

Pattern 1: Constructor-Injected Parameters

Pattern 2: Strategy Interface for Complex Logic

Pattern 3: Custom Metadata–Driven Configuration

Managing State and Error Handling

Tracking Errors Across Batches

Partial Success with Database.update(records, false)

Batch Size: The Trade-Off Nobody Explains Well

Monitoring Your Jobs

In the UI

Programmatically via SOQL

Building a Lightweight Monitoring Component

Alerting on Failures

Chaining Batches for Multi-Step Backfills

The Trade-Offs You Should Know About

A Production-Ready Template

Final Checklist Before You Run a Backfill

Partial Success with `Database.update(records, false)`