Every Shopify developer hits this wall eventually.
You need to export 200,000 orders. Or sync a 500K-product catalog. Or run a price update across every variant you carry. Standard GraphQL queries collapse under that pressure. Rate limits fire. Timeouts pile up. Your integration breaks.
The Shopify Bulk Operations API exists to fix this. It processes millions of records asynchronously, returns output as a downloadable JSONL file, and sidesteps the throttling that kills traditional query approaches.
Here is how it works, and how to scale it properly.
What Is the Shopify Bulk Operations API?
It is a subset of the Shopify GraphQL Admin API built for large-scale data retrieval and mutation. Instead of paginating through thousands of API calls, you submit one GraphQL operation and Shopify runs it server-side in the background.
When the job completes, Shopify generates a JSONL file available at a signed URL. Each line in the file is one resource object.
Two operation types exist:
| Operation Type | What It Does | Common Use Case |
|---|---|---|
bulkOperationRunQuery |
Exports data asynchronously | Orders export, catalog dump, customer list |
bulkOperationRunMutation |
Applies mutations to a large dataset | Price updates, tag writes, metafield updates |
Both follow the same lifecycle: create, poll, download.
The Operation Lifecycle
Step 1: Submit
Send a bulkOperationRunQuery mutation. Shopify queues the job and returns an operation ID with status CREATED.
mutation {
bulkOperationRunQuery(
query: """
{
products {
edges {
node {
id
title
variants {
edges {
node {
id
price
sku
}
}
}
}
}
}
}
"""
) {
bulkOperation {
id
status
}
userErrors {
field
message
}
}
}
Step 2: Poll for Status
Query currentBulkOperation at intervals until status reaches COMPLETED or FAILED.
query {
currentBulkOperation {
id
status
errorCode
createdAt
completedAt
objectCount
fileSize
url
partialDataUrl
}
}
Poll every 3 to 10 seconds for small jobs. For large datasets, back off to 30 to 60 seconds.
Step 3: Download and Parse
When COMPLETED, the url field contains a signed link valid for 7 days. Parse the JSONL line by line. Parent-child relationships are encoded via __parentId on child nodes.
Status reference:
| Status | Meaning | Action |
|---|---|---|
| CREATED | Queued, not yet running | Keep polling |
| RUNNING | Actively processing | Keep polling |
| COMPLETED | Done | Download file |
| FAILED | Error encountered | Check errorCode, retry |
| CANCELED | Manually or auto-canceled | Resubmit if needed |
| CANCELING | Cancel in progress | Wait, then resubmit |
Why This Beats Standard Pagination at Scale
Standard GraphQL pagination uses cursor-based after arguments. Fine for 10,000 records. For 1 million, you make thousands of API calls and burn through your query cost budget fast.
Bulk Operations bypass per-request rate limits almost entirely. Shopify does the work server-side. You wait, then download one file.
| Factor | Standard Pagination | Bulk Operations API |
|---|---|---|
| Rate Limit Exposure | High | Very low |
| Max Records | Limited by throttle | Millions |
| Processing Model | Synchronous | Asynchronous |
| Error Recovery | Per-request | Job-level |
| Data Format | JSON in response | JSONL file download |
| Best For | Up to ~50K records | 50K to millions |
Key Constraints You Must Know
One operation per store at a time
Only one bulk operation runs per store at any moment. Submitting a second while the first runs fails immediately. In a multi-tenant app, build per-store operation locks into your job queue.
No nested mutations
Bulk mutations require a flat JSONL input file. Each line maps to one mutation call. Nested mutations inside a single bulk operation are not supported.
Stream the output file
JSONL output can reach several gigabytes. Never load it fully into memory. Use line-by-line buffered reads.
Check partialDataUrl on failure
When a job fails or gets canceled, Shopify may still produce a partialDataUrl containing whatever completed before failure. Always check this field. Process the partial data, then retry only the remaining records.
Bulk Mutations: The Real Power Move
Bulk mutations handle operations like price updates, tag management, and metafield writes across millions of records.
The flow:
1. Stage an upload via stagedUploadsCreate to get a signed PUT URL.
2. Upload your JSONL input file. Each line is a JSON object with variables for one mutation call.
{"input": {"id": "gid://shopify/ProductVariant/123456789", "price": "29.99"}}
3. Submit the bulk mutation referencing the staged upload URL in bulkOperationRunMutation.
Output results include a __lineNumber field so you can map every success and failure back to your input file precisely.
Scaling Patterns for Production Systems
Pattern 1: Webhook-Triggered Completion
Subscribe to BULK_OPERATIONS_FINISH instead of polling. Shopify pushes completion status and the download URL to your endpoint. Your system stays idle until Shopify calls you.
Make your webhook handler idempotent. Shopify can fire the same completion event more than once.
Pattern 2: Job Orchestration Layer
For apps managing bulk operations across hundreds or thousands of merchant stores, build a layer that:
- Queues bulk operation submissions per store
- Tracks the active operation ID per store in a database
- Handles completions via webhooks
- Retries failed jobs with exponential backoff
- Logs
partialDataUrlbefore discarding failed results
Pattern 3: Chunked Mutation Strategy
Even though bulk mutations support millions of records, split very large jobs into chunks of 50K to 100K records per operation. Smaller jobs complete faster, fail cheaper, and produce lighter output files.
Track your last successfully completed chunk in a persistent state store. On failure, resume from the checkpoint rather than reprocessing everything.
Pattern 4: JSONL Processing Pipeline
Build a streaming parser that:
- Reads the output file line by line
- Reconstructs parent-child relationships via
__parentId - Writes records to your database or downstream system
- Tracks failed lines separately for targeted retries
Use worker processes separate from your web layer for this step. Monitor memory consumption carefully on multi-gigabyte files.
Error Codes and Retry Logic
| Error Code | Cause | Action |
|---|---|---|
ACCESS_DENIED |
Missing API scope | Update OAuth scopes |
INTERNAL_SERVER_ERROR |
Shopify-side failure | Retry with exponential backoff |
TIMEOUT |
Query too complex or dataset too large | Simplify query, chunk inputs |
TOO_MANY_FILE_STORAGE_REQUESTS |
Too many staged uploads in flight | Throttle upload submissions |
For timeouts: simplify the query first. Remove fields your downstream system does not use. A leaner query fixes most timeouts without reducing dataset scope.
Performance Benchmarks
| Dataset Size | Typical Completion Time | Approximate File Size |
|---|---|---|
| 10,000 products | 30 to 90 seconds | 5 to 15 MB |
| 100,000 orders | 3 to 8 minutes | 100 to 300 MB |
| 500,000 customers | 15 to 40 minutes | 500 MB to 2 GB |
| 1M+ line items | 30 to 90 minutes | 2 to 10 GB |
These are estimates, not guarantees. Always build for the upper end of each range. Never assume a fixed completion time in production logic.
App Architecture Implications
Bulk operations shift your bottleneck. You stop hammering the API with thousands of requests. Instead, you make a few API calls, then process a large local file. The bottleneck moves from network throughput to local compute and storage I/O.
Plan accordingly:
- Use worker processes separate from your web layer
- Write processed results to a fast intermediate store (Redis, PostgreSQL) before your final destination
- Use streaming HTTP clients that do not buffer the full response body
- Alert on jobs stuck in
RUNNINGstate beyond expected thresholds
Separate real-time event pipelines from batch bulk operation pipelines entirely. Use event-driven patterns for real-time updates. Reserve the bulk pipeline for scheduled or triggered large-batch jobs.
Wrapping Up
The Shopify Bulk Operations API is the right tool any time your data volume pushes past what paginated GraphQL can handle cleanly. The lifecycle is simple. The constraints are manageable. The architecture patterns are proven.
Build it with webhook-triggered completion, idempotent processing, chunked mutations, and streaming JSONL parsing, and you get a system that handles enterprise-scale data without breaking under load.
Originally published on KolachiTech: https://kolachitech.com/bulk-operations-api-at-scale/
KolachiTech is a Shopify-focused development agency specializing in API architecture, ERP integrations, and scalable app development. If you need a production-grade bulk operations system built for your store or app, get in touch.
Top comments (0)