Fully automated daily sync of high-volume affiliate feeds (Connexity, Shopping24) into Microsoft Merchant Center, OOM-safe, chunked upload, 100 % compliant.
The challenge
Blender Networks Inc. runs a large price-comparison portal whose entire monetization is exclusively built on Microsoft Advertising Product Listing Ads (PLAs). The previous in-house feed solution was unstable and had to be replaced.
The complication came from the heterogeneous third-party sources: Connexity delivers zipped JSON bundles via an API index, Shopping24 (S24) provides a CSV master file plus fragment updates over FTP. Both formats had to be flawlessly translated into Microsoft Merchant Center's strictly defined TSV schema. Every format error, every "item drop" = direct revenue loss.
Additional difficulty: Connexity data routinely exceeds 2 GB per publisher account, the mapped output runs into double-digit gigabytes, and the full daily upload into Microsoft Merchant Center sits at around 200 GB. Naive in-memory processing was off the table, the pipeline had to stay OOM-safe even on modest hardware.
The approach
Three heterogeneous sources, one unified pipeline. The architecture follows a strict three-stage model, Ingest → Map → Upload, where each affiliate network sits behind the same interface but internally taps its own stack (see pipeline diagram below).
-
Streaming-first ingestion. A Python pipeline using
ijsonparses the zipped JSON bundles item-by-item directly off the stream instead of deserializing the whole document into RAM. Memory usage stays constant regardless of feed size. -
Disk-backed deduplication. A SQLite table with
PRAGMAtuning handles cross-account deduplication on disk. Multi-million-item feeds stay clean without blowing up the heap, the dedup state survives even on abort, ready for inspection. -
Strict mapping to the MS spec. Every source record gets deterministically mapped onto the required
id/title/description/link/image_link/price/...schema. Validation logic filters unusable brand strings (purely numeric, too long, too many words), checks GTINs for valid lengths (8/12/13/14) and setsidentifier_existsconsistently, no more "partial identifier" warnings. -
15 GB chunking & chunk-aware upload. Mapping outputs are auto-split at 15 GB into
_0.txt,_1.txt, … to match the Microsoft upload limit. The uploader detects these chunks via pattern matching and numbers them remotely correctly (TipDigest_US_0.txt,TipDigest_US_1.txt, …). - High-speed SFTP via LFTP. Instead of Python SFTP libraries (paramiko), the system LFTP binary is driven with enlarged TCP socket buffer and a reconnect strategy. Significantly faster and more stable than any Python implementation on multi-GB files.
- Multi-account orchestration. Multiple Connexity publishers and multiple Merchant Center accounts (US/DE) are processed in parallel; each account writes to its own output path and is correctly routed via a store-mapping config.
-
Resilient daily sync. A cron lockfile prevents overlapping double-runs. Pipeline stages can be toggled individually (
--skip-ingest,--skip-map,--skip-upload) for targeted re-runs after partial failures, without re-pulling the entire 2 GB download. - Proactive error handling & diagnostics. Hybrid logging (Rich console with emojis for humans + RotatingFileHandler for the machine), per-stage execution-report table, per-account isolation. A single truncated JSON stream doesn't stop the overall run, errors get logged locally and the run continues with the rest.
-
Modular architecture. Each affiliate network lives in its own module (
ingest_*.py+mapper_*.py) behind a unified pipeline interface. A third source (Kelkoo) was added without touching Connexity or S24, proof the abstraction holds. - Acceptance criterion. The hard acceptance bar (5+ consecutive days of error-free automation) was passed on the first attempt, secured by clean module boundaries and consistent validation layering.
The diagram shows the full data flow: from heterogeneous source systems (zipped JSON streams, REST API, FTP dumps), through the OOM-safe ingest layer, the validation-driven mapping with 15 GB chunking, all the way to the chunk-aware LFTP upload into Microsoft Merchant Center.
The result
- 100 % feed compliance, no more disapprovals from format errors.
- Zero errors in the daily sync, acceptance criterion passed first try.
- ROI protected, no revenue loss from broken feed updates.
- OOM-safe on multi-GB feeds, pipeline runs on modest hardware without memory pressure.
- Modularly extensible, new affiliate networks integrate without touching the core.

Top comments (0)