DEV Community

Don't pull the entire dump if you only need a small piece

In one of my recent projects for a Canadian company, I was tasked with generating fairly complex PDF reports in a distributed data system.[file:54]

The report itself was based on data from just three tables – but those tables contained millions of rows.

The obvious problem: the database on dev/stage is huge.

Pulling a full backup (hundreds of GBs) to a local machine just to debug one reporting module is a great way to waste half a day and most of your SSD.

The naive approach: full backup or SSMS scripts

Most teams start with one of these options:

  • Restore a full database backup locally.
  • Use SSMS → Generate Scripts (Schema + Data) for the required tables.

Both options look reasonable on paper, but break down at scale:

  • A full backup:

    • Takes hours to download and restore.
    • Eats a massive chunk of disk space.
    • Includes a lot of data you simply do not need for a specific test case.
  • SSMS “Generate Scripts” for tables with millions of rows:

    • Tries to produce a gigantic INSERT script.
    • Frequently freezes or dies with OutOfMemory.
    • Even if the script is generated, executing it can take many hours.

At some point it becomes clear: we don’t need the whole mountain – just a slice of rock.

A better approach: surgical extraction with BCP

For SQL Server, my go‑to tool here is bcp (Bulk Copy Program).

Instead of moving an entire 200+ GB database around, I extract only the data slices needed for debugging and recreate them locally.

The idea is simple:

  • Keep the schema in source control (migrations, DDL scripts).
  • Use BCP to move only the relevant data between environments.

Why BCP beats SSMS for this use case

BCP is not new or flashy, but it shines when you need to move lots of rows quickly.

Key advantages:

  • Speed

    BCP works with a binary stream in native format.

    There’s no SQL text to parse, no giant INSERT statements.

    Millions of rows can be exported/imported in minutes.

  • Stability

    A console utility does not hang on a heavy UI operation or choke while rendering a huge script file.

    It either runs fast or fails with a clear error.

  • Flexibility

    You can export:

    • A whole table, or
    • The result of an arbitrary SELECT with joins and filters.

This means you can:

  • Export just the subset of data you need for a given test scenario.
  • Exclude sensitive columns or mask them directly in the SELECT.

Practical workflow: from prod slice to local debug

In the PDF-reporting case, the flow looked like this:

  1. Prepare the schema locally
  • Apply migrations or run your DDL scripts to create the same tables locally.
  • No data yet, just structure.
  1. Export the data slice from upstream environment

Example (conceptual):

   bcp "SELECT c.Id, c.Name, o.Id, o.Date, o.Total
        FROM dbo.Customers c
        JOIN dbo.Orders o ON o.CustomerId = c.Id
        WHERE o.Date >= '2025-01-01'"
        queryout orders_customers.dat -n -S your-server -T
Enter fullscreen mode Exit fullscreen mode


bash
Notes:

queryout lets you export the result of a SELECT, not just a whole table.

-n uses native (binary) format — fast and compact.

-T uses trusted connection; replace with -U/-P if needed.

Import into local database

bcp dbo.OrdersCustomers in orders_customers.dat -n -S localhost -T
Enter fullscreen mode Exit fullscreen mode


bash

You can split this into several files (per table) if it better matches your local schema.

Debug with realistic volumes

Now your local database contains:

The relevant tables.

The right data shape.

Realistic row counts (millions if needed).

But you never had to restore the full 200+ GB backup.

When you should consider this approach
This pattern works especially well when:

You have a huge SQL Server database in upper environments.

You need realistic data for:

Debugging complex business logic.

Reproducing production-only bugs.

Testing performance of reports or batch jobs.

You don’t want to:

Drag full backups around.

Rely on fragile UI exports.

Maintain a massive “test data” SQL script by hand.

BCP is not a silver bullet, but in scenarios where:

You know exactly which subset of data you need.

The volumes are too big for SSMS scripting.

You already have schema migrations in place.

…it can save you many hours per week and make local debugging much more pleasant.

How do you seed your local databases from large upstream environments?
Full backups, BACPACs, custom seeders, or something else?

Let me know — always curious about real-world workflows around this problem.

Top comments (0)