Every SAP Commerce project involves data migration — whether you're migrating from a legacy platform, consolidating multiple systems, upgrading from an older Hybris version, or integrating with external data sources. The difference between a smooth go-live and a catastrophic one often comes down to how well the data migration was planned and tested.
This article covers the migration strategies, tooling, common pitfalls, and testing approaches that make the difference between a migration that works in theory and one that works in production.
Planning the Migration
Step 1: Data Inventory
Before writing any migration code, catalog what data needs to migrate:
Data Inventory Worksheet:
┌──────────────────┬──────────┬───────────┬────────────┬──────────────┐
│ Data Entity │ Source │ Volume │ Priority │ Dependencies │
├──────────────────┼──────────┼───────────┼────────────┼──────────────┤
│ Products │ PIM │ 50,000 │ Critical │ Categories │
│ Categories │ PIM │ 2,000 │ Critical │ None │
│ Prices │ ERP │ 200,000 │ Critical │ Products │
│ Stock Levels │ ERP │ 50,000 │ Critical │ Products, │
│ │ │ │ │ Warehouses │
│ Customers │ CRM │ 500,000 │ High │ None │
│ Addresses │ CRM │ 800,000 │ High │ Customers │
│ Order History │ OMS │ 2,000,000 │ Medium │ Customers, │
│ │ │ │ │ Products │
│ CMS Pages │ CMS │ 500 │ High │ Media │
│ Media/Images │ CDN/PIM │ 150,000 │ High │ None │
│ Promotions │ Legacy │ 200 │ Medium │ Products, │
│ │ │ │ │ Categories │
│ Classifications │ PIM │ 10,000 │ High │ Products │
└──────────────────┴──────────┴───────────┴────────────┴──────────────┘
Step 2: Dependency Graph
Data must be loaded in dependency order. Loading products before categories fails because products reference categories.
Load Order (dependency-based):
1. Reference Data → Countries, Currencies, Languages, Units
2. Warehouses → Warehouse definitions, PointOfService
3. Categories → Category hierarchy (parent before child)
4. Classification → ClassificationSystem, ClassAttributeAssignment
5. Products → Base products, then variants
6. Prices → Price rows linked to products
7. Stock → Stock levels linked to products + warehouses
8. Media → Product images, category images
9. Customers → Customer accounts
10. Addresses → Customer addresses
11. Orders → Historical orders (optional)
12. CMS Content → Pages, components, media
13. Promotions → Promotion rules
Step 3: Mapping Specification
Document how source fields map to SAP Commerce fields:
Product Mapping:
┌──────────────────┬──────────────────────┬──────────────────┐
│ Source Field │ SAP Commerce Field │ Transformation │
├──────────────────┼──────────────────────┼──────────────────┤
│ sku │ Product.code │ Direct │
│ title │ Product.name[en] │ Direct │
│ titel │ Product.name[de] │ Direct │
│ description_html │ Product.description │ HTML sanitize │
│ price_usd │ PriceRow.price │ Decimal (2dp) │
│ weight_lbs │ Product.weight │ Convert to kg │
│ category_path │ CategoryProductRel │ Split on '>' │
│ main_image_url │ Media.URL │ Download + store │
│ brand_name │ Product.manufacturer │ Lookup by name │
│ active │ Product.approvalStatus│ true→approved │
│ created_at │ Product.creationtime │ ISO date parse │
└──────────────────┴──────────────────────┴──────────────────┘
Automated Migration Pipeline
For large volumes, build an automated pipeline rather than running ImpEx manually.
Architecture
┌─────────────┐ ┌──────────────┐ ┌───────────────┐ ┌──────────────┐
│ Source │ │ Extract & │ │ ImpEx │ │ SAP Commerce │
│ Systems │───>│ Transform │───>│ Generator │───>│ Import │
│ (DB, API, │ │ (Python/Java)│ │ │ │ (HAC/API) │
│ CSV, etc) │ │ │ │ │ │ │
└─────────────┘ └──────────────┘ └───────────────┘ └──────────────┘
│ │ │
▼ ▼ ▼
Validation Generated Import Logs
Reports .impex files & Error Reports
Python ETL Script Example
import csv
import re
from decimal import Decimal
class ProductMigrator:
"""Transforms source product data to ImpEx format."""
IMPEX_HEADER = """
$catalogVersion = catalogVersion(catalog(id[default='myProductCatalog']),version[default='Staged'])
$supercategories = supercategories(code, $catalogVersion)
$approved = approvalStatus(code)[default='approved']
INSERT_UPDATE Product;code[unique=true];name[lang=en];description[lang=en];$catalogVersion;$supercategories;$approved;unit(code)[default='pieces'];ean;manufacturerName
"""
def __init__(self, source_file, output_file, batch_size=5000):
self.source_file = source_file
self.output_file = output_file
self.batch_size = batch_size
self.errors = []
self.processed = 0
self.skipped = 0
def transform_product(self, row):
"""Transform a single source row to ImpEx fields."""
code = self.sanitize_code(row['sku'])
if not code:
self.errors.append(f"Invalid SKU: {row.get('sku', 'EMPTY')}")
return None
name = self.sanitize_text(row.get('title', ''))
if not name:
self.errors.append(f"Missing name for SKU: {code}")
return None
description = self.sanitize_html(row.get('description_html', ''))
category = self.map_category(row.get('category_path', ''))
ean = row.get('ean', '')
manufacturer = self.sanitize_text(row.get('brand_name', ''))
return f";{code};{name};{description};;{category};;;{ean};{manufacturer}"
def sanitize_code(self, code):
"""Ensure product code is valid."""
if not code:
return None
# Remove special characters, keep alphanumeric and hyphens
return re.sub(r'[^a-zA-Z0-9\-_]', '', str(code).strip())
def sanitize_text(self, text):
"""Clean text for ImpEx (escape semicolons)."""
if not text:
return ''
text = str(text).strip()
# ImpEx uses semicolons as delimiters — escape them
text = text.replace(';', '\;')
# Remove newlines
text = text.replace('
', ' ').replace('
', '')
return text
def sanitize_html(self, html):
"""Clean HTML content for product descriptions."""
if not html:
return ''
# Remove script tags
html = re.sub(r'<script[^>]*>.*?</script>', '', html, flags=re.DOTALL)
# Escape semicolons
html = html.replace(';', '\;')
return html.strip()
def map_category(self, category_path):
"""Map source category path to SAP Commerce category code."""
# Source: "Electronics > Cameras > DSLR"
# Target: "dslr" (leaf category code)
if not category_path:
return ''
parts = [p.strip().lower().replace(' ', '-') for p in category_path.split('>')]
return parts[-1] if parts else ''
def run(self):
"""Execute the migration."""
with open(self.source_file, 'r', encoding='utf-8') as infile, \
open(self.output_file, 'w', encoding='utf-8') as outfile:
reader = csv.DictReader(infile)
outfile.write(self.IMPEX_HEADER)
for row in reader:
line = self.transform_product(row)
if line:
outfile.write(line + '
')
self.processed += 1
else:
self.skipped += 1
# Write batch separator for large imports
if self.processed % self.batch_size == 0:
outfile.write(f"
# --- Batch {self.processed // self.batch_size} ---
")
self.write_report()
def write_report(self):
"""Generate migration report."""
report = f"""
Migration Report
================
Source: {self.source_file}
Output: {self.output_file}
Processed: {self.processed}
Skipped: {self.skipped}
Errors: {len(self.errors)}
Error Details:
"""
for error in self.errors[:100]: # First 100 errors
report += f" - {error}
"
with open(self.output_file.replace('.impex', '_report.txt'), 'w') as f:
f.write(report)
print(report)
# Usage
migrator = ProductMigrator('source_products.csv', 'products_import.impex')
migrator.run()
Data Validation
Pre-Import Validation
Validate data before importing to catch issues early:
class DataValidator:
"""Validate migration data before import."""
def __init__(self):
self.errors = []
self.warnings = []
def validate_products(self, products):
seen_codes = set()
for i, product in enumerate(products):
row = i + 1
# Required fields
if not product.get('code'):
self.errors.append(f"Row {row}: Missing product code")
continue
# Duplicates
if product['code'] in seen_codes:
self.errors.append(f"Row {row}: Duplicate code '{product['code']}'")
seen_codes.add(product['code'])
# Code format
if not re.match(r'^[A-Za-z0-9\-_]+$', product['code']):
self.errors.append(
f"Row {row}: Invalid code format '{product['code']}' "
f"(only alphanumeric, hyphens, underscores)")
# Name length
name = product.get('name', '')
if len(name) > 255:
self.warnings.append(
f"Row {row}: Name exceeds 255 chars, will be truncated")
# Price validation
price = product.get('price')
if price:
try:
p = Decimal(str(price))
if p < 0:
self.errors.append(f"Row {row}: Negative price {price}")
if p > 999999.99:
self.warnings.append(f"Row {row}: Very high price {price}")
except:
self.errors.append(f"Row {row}: Invalid price format '{price}'")
# Category reference
if not product.get('category'):
self.warnings.append(f"Row {row}: No category for '{product['code']}'")
return len(self.errors) == 0
Post-Import Verification
After import, verify data integrity:
# Verification queries via FlexibleSearch
# Check product count
# Expected: 50,000
SELECT COUNT(*) FROM {Product} WHERE {catalogVersion} = (
SELECT {pk} FROM {CatalogVersion} WHERE {version} = 'Staged'
AND {catalog} = (SELECT {pk} FROM {Catalog} WHERE {id} = 'myProductCatalog')
)
# Find products without prices
SELECT {p.code} FROM {Product AS p}
WHERE {p.catalogVersion} = ?cv
AND NOT EXISTS (
SELECT 1 FROM {PriceRow AS pr} WHERE {pr.product} = {p.pk}
)
# Find products without categories
SELECT {p.code} FROM {Product AS p}
WHERE {p.catalogVersion} = ?cv
AND NOT EXISTS (
SELECT 1 FROM {CategoryProductRelation AS r} WHERE {r.target} = {p.pk}
)
# Find products without images
SELECT {p.code} FROM {Product AS p}
WHERE {p.catalogVersion} = ?cv
AND {p.picture} IS NULL
AND {p.thumbnail} IS NULL
// Programmatic verification
@Component
public class MigrationVerifier {
@Resource
private FlexibleSearchService flexibleSearchService;
public MigrationReport verify(CatalogVersionModel catalogVersion) {
MigrationReport report = new MigrationReport();
// Count totals
report.setProductCount(countItems("Product", catalogVersion));
report.setCategoryCount(countItems("Category", catalogVersion));
report.setMediaCount(countItems("Media", catalogVersion));
// Find orphans
report.setProductsWithoutPrices(findProductsWithoutPrices(catalogVersion));
report.setProductsWithoutCategories(findProductsWithoutCategories(catalogVersion));
report.setProductsWithoutImages(findProductsWithoutImages(catalogVersion));
// Validate critical fields
report.setProductsWithEmptyNames(findProductsWithEmptyField("name", catalogVersion));
return report;
}
private long countItems(String typeCode, CatalogVersionModel cv) {
String query = "SELECT COUNT(*) FROM {" + typeCode + "} WHERE {catalogVersion} = ?cv";
FlexibleSearchQuery fsq = new FlexibleSearchQuery(query);
fsq.addQueryParameter("cv", cv);
fsq.setResultClassList(Collections.singletonList(Long.class));
return flexibleSearchService.<Long>search(fsq).getResult().get(0);
}
}
Rollback Strategy
Every migration needs a rollback plan.
Database-Level Rollback
# Pre-migration: Full database backup
mysqldump -h db-host -u admin -p commerce_db > pre_migration_backup.sql
# Or for PostgreSQL
pg_dump -h db-host -U admin commerce_db > pre_migration_backup.sql
# Rollback if migration fails
mysql -h db-host -u admin -p commerce_db < pre_migration_backup.sql
ImpEx-Based Rollback
For targeted rollback, generate reverse ImpEx:
# Remove incorrectly imported products
REMOVE Product;code[unique=true];$catalogVersion
;BAD-PROD-001;
;BAD-PROD-002;
;BAD-PROD-003;
# Or remove by query
"#% impex.initScript(""
import de.hybris.platform.servicelayer.search.FlexibleSearchQuery;
query = new FlexibleSearchQuery(""SELECT {pk} FROM {Product} WHERE {creationtime} > '2024-01-15 10:00:00'"");
results = flexibleSearchService.search(query).getResult();
for (item : results) {
modelService.remove(item);
}
"");"
Best Practices
Never migrate directly to Online catalog — always import into Staged and synchronize to Online after verification.
Idempotent imports — use
INSERT_UPDATEso you can re-run imports safely if they fail partway through.Validate before importing — catch data issues in the transformation layer, not during import.
Keep source data snapshots — archive the exact source data used for each migration run. You'll need it for debugging.
Document every transformation — when you convert
weight_lbsto kilograms or mapactive=truetoapprovalStatus=approved, document it. Six months later, someone will ask why a value looks wrong.Plan for delta migrations — there's always a gap between the initial load and go-live. Plan how changes made in the source system during this window will be captured.
Test with production-scale data — migration that works with 1,000 products may fail with 500,000 due to memory, timeouts, or performance.
Have a rollback plan — and test it. A backup you've never restored is not a backup.
Top comments (0)