In one of my recent projects for a Canadian company, I was tasked with generating fairly complex PDF reports in a distributed data system.[file:54]
The report itself was based on data from just three tables – but those tables contained millions of rows.
The obvious problem: the database on dev/stage is huge.
Pulling a full backup (hundreds of GBs) to a local machine just to debug one reporting module is a great way to waste half a day and most of your SSD.
The naive approach: full backup or SSMS scripts
Most teams start with one of these options:
- Restore a full database backup locally.
- Use SSMS → Generate Scripts (Schema + Data) for the required tables.
Both options look reasonable on paper, but break down at scale:
-
A full backup:
- Takes hours to download and restore.
- Eats a massive chunk of disk space.
- Includes a lot of data you simply do not need for a specific test case.
-
SSMS “Generate Scripts” for tables with millions of rows:
- Tries to produce a gigantic
INSERTscript. - Frequently freezes or dies with OutOfMemory.
- Even if the script is generated, executing it can take many hours.
- Tries to produce a gigantic
At some point it becomes clear: we don’t need the whole mountain – just a slice of rock.
A better approach: surgical extraction with BCP
For SQL Server, my go‑to tool here is bcp (Bulk Copy Program).
Instead of moving an entire 200+ GB database around, I extract only the data slices needed for debugging and recreate them locally.
The idea is simple:
- Keep the schema in source control (migrations, DDL scripts).
- Use BCP to move only the relevant data between environments.
Why BCP beats SSMS for this use case
BCP is not new or flashy, but it shines when you need to move lots of rows quickly.
Key advantages:
Speed
BCP works with a binary stream in native format.
There’s no SQL text to parse, no giantINSERTstatements.
Millions of rows can be exported/imported in minutes.Stability
A console utility does not hang on a heavy UI operation or choke while rendering a huge script file.
It either runs fast or fails with a clear error.-
Flexibility
You can export:- A whole table, or
- The result of an arbitrary
SELECTwith joins and filters.
This means you can:
- Export just the subset of data you need for a given test scenario.
- Exclude sensitive columns or mask them directly in the SELECT.
Practical workflow: from prod slice to local debug
In the PDF-reporting case, the flow looked like this:
- Prepare the schema locally
- Apply migrations or run your DDL scripts to create the same tables locally.
- No data yet, just structure.
- Export the data slice from upstream environment
Example (conceptual):
bcp "SELECT c.Id, c.Name, o.Id, o.Date, o.Total
FROM dbo.Customers c
JOIN dbo.Orders o ON o.CustomerId = c.Id
WHERE o.Date >= '2025-01-01'"
queryout orders_customers.dat -n -S your-server -T
bash
Notes:
queryout lets you export the result of a SELECT, not just a whole table.
-n uses native (binary) format — fast and compact.
-T uses trusted connection; replace with -U/-P if needed.
Import into local database
bcp dbo.OrdersCustomers in orders_customers.dat -n -S localhost -T
bash
You can split this into several files (per table) if it better matches your local schema.
Debug with realistic volumes
Now your local database contains:
The relevant tables.
The right data shape.
Realistic row counts (millions if needed).
But you never had to restore the full 200+ GB backup.
When you should consider this approach
This pattern works especially well when:
You have a huge SQL Server database in upper environments.
You need realistic data for:
Debugging complex business logic.
Reproducing production-only bugs.
Testing performance of reports or batch jobs.
You don’t want to:
Drag full backups around.
Rely on fragile UI exports.
Maintain a massive “test data” SQL script by hand.
BCP is not a silver bullet, but in scenarios where:
You know exactly which subset of data you need.
The volumes are too big for SSMS scripting.
You already have schema migrations in place.
…it can save you many hours per week and make local debugging much more pleasant.
How do you seed your local databases from large upstream environments?
Full backups, BACPACs, custom seeders, or something else?
Let me know — always curious about real-world workflows around this problem.
Top comments (0)