filesql: SQL Driver for CSV, TSV, LTSV, Parquet, and Excel Files in Go

#showdev #sql #database #go

Why I Created FileSQL

I previously built sqly and sqluv - both CLI tools for running SQL on CSV/TSV files. After maintaining these projects, I realized I was duplicating the same file-handling logic across both tools. So I thought: why not extract this functionality into a reusable library?

The key insight was using Go's standard sql.DB interface. Every Go developer knows how to use database/sql - it's the de facto standard for database operations. By implementing this familiar interface, FileSQL becomes instantly usable for anyone who's written database code in Go.

Back when I was building sqly and sqluv, I was constantly dealing with massive CSV files that needed complex transformations. Importing them into a real database was overkill, but processing them with basic tools was painful. I know many developers face this same struggle, so I wanted to create a library that makes this task trivial.

How to Use

Here's a real-world example that shows FileSQL's power:

// Analyze sales data across multiple file formats
db, err := filesql.Open(
    "sales_2024.csv",        // Current year sales
    "products.xlsx",         // Product master in Excel
    "customers.parquet",     // Customer data in Parquet
    "regions.tsv",          // Region mapping
)

// Complex analytical query with multiple JOINs and aggregations
rows, err := db.Query(`
    WITH monthly_sales AS (
        SELECT 
            strftime('%Y-%m', s.date) as month,
            s.product_id,
            s.customer_id,
            s.amount,
            r.region_name
        FROM sales_2024 s
        JOIN regions r ON s.region_code = r.code
        WHERE s.amount > 0
    )
    SELECT 
        ms.month,
        p.category,
        ms.region_name,
        c.customer_segment,
        COUNT(DISTINCT ms.customer_id) as unique_customers,
        SUM(ms.amount) as total_revenue,
        AVG(ms.amount) as avg_transaction
    FROM monthly_sales ms
    JOIN products p ON ms.product_id = p.id
    JOIN customers c ON ms.customer_id = c.id
    GROUP BY ms.month, p.category, ms.region_name, c.customer_segment
    HAVING SUM(ms.amount) > 10000
    ORDER BY ms.month DESC, total_revenue DESC
`)

Key Features

Multiple file formats: CSV, TSV, LTSV, Parquet, Excel (XLSX) - all queryable with SQL
Compression support: Automatically handles .gz, .bz2, .xz, .zst compressed files
Stream processing: Handles gigabyte-sized files efficiently through configurable chunk processing
Flexible input sources: Files, directories, io.Reader, embed.FS - whatever you have
Auto-save: Changes can be persisted back to the original files

Current Phase

Right now I'm deep in the unglamorous work: squashing bugs, improving robustness, and refactoring code for maintainability. Once this foundation is solid, I'll move on to feature enhancements and push toward the v1.0.0 release.

The roadmap has all the details, but the focus is on stability first, features second.

I'll be honest - I originally built this just to peek at a handful of local files. But then I realized people wanted to use it for ETL pipelines and more serious data processing, so I'm evolving FileSQL to meet those needs too.

Join the Journey

I want to build FileSQL with the community, not in isolation. There are many ways to contribute:

Share your data challenges: What weird CSV formats are you dealing with? Send me sample files (anonymized of course)
Suggest use cases: How would you use FileSQL in your projects?
Report bugs: Found something broken? Let me know!
Improve docs: Is something confusing? Help me clarify it
Spread the word: Share on social media, write a blog post, tell your team
Code contributions: Yes, using LLMs for coding is totally fine - I embrace all modern development approaches

Even a simple star on GitHub helps more than you know.

My Life Goals

This might sound dramatic, but FileSQL is part of my bucket list:

Reach 1,000 GitHub stars on a project I created
Get my first GitHub Sponsor

These aren't just vanity metrics for me - they represent validation that I've built something genuinely useful for the community. FileSQL is my best shot at achieving these dreams.

By the way, most of my other bucket list items have nothing to do with development, like 'put an arcade cabinet in my room' and 'display Toraya's "96-piece mini yokan set" on my desk and feel smugly satisfied about it'.