DEV Community

Oddshop
Oddshop

Posted on • Originally published at oddshop.work

Marketplace Job Listings Data Cleaner

We just released Marketplace Job Listings Data Cleaner — clean and deduplicate scraped amazon job listings from csv or json files.

What it does

This tool processes raw, messy scraped job data from Amazon's careers pages. It's for Python developers and data analysts who need stable, structured datasets for analysis. It handles common scraping inconsistencies like duplicate entries, broken HTML, and varying date formats.

Features

  • Deduplicate listings by job ID and title — removes exact and fuzzy duplicates
  • Standardize date formats — converts various string formats to ISO 8601
  • Clean HTML artifacts — strips tags and normalizes whitespace from description fields
  • Validate and structure location data — parses city, state, country into separate columns
  • Export to clean CSV or JSON — outputs a consistent, analysis-ready file

Usage

amazon_job_cleaner --input messy_listings.csv --output clean_listings.json
Enter fullscreen mode Exit fullscreen mode

Requirements

Python 3.8+. Install dependencies:

pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Get it

Download Marketplace Job Listings Data Cleaner for $29 →

Buy once, use anywhere. ZIP includes the full script, README, and usage examples.


Originally published at oddshop.work
Built by OddShop — Python automation tools, one new release every week.

Top comments (0)