DEV Community

Juliana Albertyn
Juliana Albertyn

Posted on

My first Python project: Excel to SQL pipeline (feedback welcome)

I’ve just made my first substantial Python project public. It’s an Excel-to-SQL pipeline focused on cleaning messy spreadsheet data and preparing it for database ingestion.

This is still a work in progress, and I’m actively improving it.

Problem

Working with Excel data often means dealing with:

  • Inconsistent date and time formats
  • Mixed data types in a single column
  • Missing or malformed values
  • Subtle issues that only surface during database insertion

I wanted a reusable way to clean and standardise this kind of data before loading it into SQL.

Approach

The project focuses on:

  • Column-wise cleaning functions (dates, times, text, etc.)
  • Configurable parsing with strict vs permissive modes
  • Clear error reporting with row-level context
  • Separation between cleaning logic and pipeline orchestration

The goal is to make the pipeline predictable and easier to debug when something goes wrong.

Example

Input (Excel):

  • Mixed date formats
  • Numbers stored as text
  • Invalid values scattered through columns

Output:

  • Cleaned, typed data ready for SQL insertion
  • Invalid values either coerced or flagged, depending on mode

What I’m unsure about

I’d really value feedback on:

  • Code structure and modularity
  • Naming conventions and readability
  • Error handling design
  • Testing approach and coverage
  • Overall project organisation

Next step

The next phase is building the SQL writer layer so the pipeline can automatically create tables in SQL Server and populate them with the cleaned data.

Repo

https://github.com/juliana-albertyn/excel-to-sql


I’m learning as I build, so constructive criticism is very welcome.

Top comments (0)