My first Python project: Excel to SQL pipeline (feedback welcome)

#python #codequality #opensource #dataengineering

I’ve just made my first substantial Python project public. It’s an Excel-to-SQL pipeline focused on cleaning messy spreadsheet data and preparing it for database ingestion.

This is still a work in progress, and I’m actively improving it.

Problem

Working with Excel data often means dealing with:

Inconsistent date and time formats
Mixed data types in a single column
Missing or malformed values
Subtle issues that only surface during database insertion

I wanted a reusable way to clean and standardise this kind of data before loading it into SQL.

Approach

The project focuses on:

Column-wise cleaning functions (dates, times, text, etc.)
Configurable parsing with strict vs permissive modes
Clear error reporting with row-level context
Separation between cleaning logic and pipeline orchestration

The goal is to make the pipeline predictable and easier to debug when something goes wrong.

Example

Input (Excel):

Mixed date formats
Numbers stored as text
Invalid values scattered through columns

Output:

Cleaned, typed data ready for SQL insertion
Invalid values either coerced or flagged, depending on mode

What I’m unsure about

I’d really value feedback on:

Code structure and modularity
Naming conventions and readability
Error handling design
Testing approach and coverage
Overall project organisation

Next step

The next phase is building the SQL writer layer so the pipeline can automatically create tables in SQL Server and populate them with the cleaned data.