DEV Community

Niccolò Colombini
Niccolò Colombini

Posted on

Shopify Automation: How I Managed an 80,000-Product Catalog with Python & Pandas

Manually managing an e-commerce catalog with 80,000 products, 11 different suppliers, and 5 languages is not just inefficient — it's a direct risk to your business margins. Each supplier sends price lists in different formats (.csv, .xlsx) with inconsistently named columns, making it practically impossible to keep prices, wholesale costs, and stock availability updated by hand.

To solve this, I built a Python and Pandas workflow based on three technical pillars: backup management, dynamic price optimization, and automatic product onboarding.


1. Data Architecture: Safety and Golden Backup

Catalog integrity starts with two strategic exports from Matrixify:

  • Safety Backup (Daily): Contains only the essential columns (barcode, ID, handle, SKU, price, and quantity). This is the file processed daily for fast syncs.
  • Golden Backup (Weekly): The complete "source of truth" for the entire site, imported into a SQL database for deeper analysis and duplicate management.

Each script cross-references the supplier file with the corresponding Safety Backup, ensuring that only fresh, relevant data is touched — without ever overwriting the full catalog unnecessarily.


2. Dynamic Pricing and Margin Algorithm

The selling price calculation is not linear. It follows brand-specific business rules, handled by a single method — apply_discount_or_margin — that covers two main scenarios:

  • Standard Logic: The script starts from the wholesale cost, applies a markup, and verifies that the minimum guaranteed margin is respected.
  • Imposed Price Logic: For brands that define their own retail price, the script validates the residual margin and checks that the applied discount does not exceed contractual thresholds.
def apply_discount_or_margin(
    self,
    discount: Optional[float],   # None for brands with imposed prices
    markup: float = 0.5,
    min_margin: float = 40,       # Minimum guaranteed margin in €
    max_margin: Optional[float] = None,
    vat: float = 1.22,
    extra_discount: Optional[float] = None,
    max_discount: Optional[float] = None
) -> pd.DataFrame:
Enter fullscreen mode Exit fullscreen mode

3. Surgical Operations with Pandas

Scripts are designed to intervene only where needed, avoiding unnecessary catalog overwrites.

  • "Surgical" SEO Update: Instead of reloading the entire database, the script identifies only products with missing meta-tags. This is done via a merge(how='left') against the backup; only incomplete records are filtered and updated, keeping Shopify imports fast and lightweight.
mask = (
    backup_df["Body HTML"].isna() |
    backup_df["SEO Title"].isna() |
    backup_df["SEO Description"].isna()
)
to_update = backup_df[mask]
Enter fullscreen mode Exit fullscreen mode
  • New Product Detection: The system compares supplier barcodes against the Safety Backup. Using a merge(how='right') and filtering for right_only rows in the _merge column, the script isolates products not yet in the system and automatically populates them with prices, quantities, and SEO-ready descriptions.
new_products = supplier_df[
    ~supplier_df["Barcode"].isin(backup_df["Barcode"])
]
Enter fullscreen mode Exit fullscreen mode

4. Orchestration: Crontab, SFTP, and Matrixify

The scripts don't run manually — the entire workflow is fully automated through three components working in sync:

  1. Crontab schedules and triggers each Python script on my NAS server at defined intervals — daily for price and quantity updates, with separate jobs for SEO and new product detection.
  2. The scripts output the processed CSV files directly to a dedicated SFTP server.
  3. Matrixify is connected to the SFTP and runs on a precisely configured schedule, automatically picking up the files and importing them into Shopify.

The result is a zero-touch pipeline: from the supplier's price list landing on the server to the products being updated on Shopify, no manual intervention is needed at any stage.

[Crontab on NAS]
      ↓ triggers Python scripts
[Processed CSV files]
      ↓ deposited to SFTP
[Matrixify scheduled import]
      ↓ auto-imports into Shopify
[Live catalog updated]
Enter fullscreen mode Exit fullscreen mode

Results

This approach reduces hours of manual work to a few minutes of execution, eliminating human errors on markups and keeping the catalog continuously updated and SEO-optimized.


Do you manage an e-commerce catalog with Python, or have you built a similar workflow? Share your approach in the comments — I'd love to compare notes.

python #ecommerce #shopify #automation #pandas

Top comments (0)