nchika

Posted on Dec 10, 2025

A Minimal Go Toolkit for Cleaning, Validating, and Querying CSV/TSV/Excel/Parquet Files

#sql #go #showdev

In many software systems, not all data lives inside a database.
Sometimes it’s stored in structured files such as CSVs, TSVs, or spreadsheets. And in practice, these files are not always clean. Data may be entered manually by someone who isn’t familiar with the format, values may be missing, and rows may contain inconsistencies that break downstream processing.

When the file is large, trying to “fix it in Excel” often leads to crashes — or simply makes it difficult to identify where the invalid records are.

I faced these issues repeatedly in real projects, and eventually decided to build a set of libraries to address them. Since I’m a Go developer, adopting the Python ETL ecosystem was never my first choice.

Instead, I created three small Go libraries that focus on preprocessing, validation, and lightweight analysis of structured file data:

fileprep — preprocessing + field-level validation using struct tags
fileframe — a tiny, immutable DataFrame for filtering and inspection
filesql — run SQL directly on CSV/TSV/LTSV/Excel/Parquet via embedded SQLite

These libraries are independent from one another.
What connects them is the shared use of the io.Reader interface.
By passing data as a familiar io.Reader, you don’t have to think about any custom data structures inside the libraries — you can work with the standard Go primitives you already know.

Now, let me explain the characteristics of each library.

fileprep

Features

Preprocessing: trim, replace, Unicode normalization, type coercion, etc.
Clear error reporting: identifies which row and which column caused the failure
Supports composite validation (cross-column rules)
Works with CSV/TSV/LTSV/Parquet/Excel — anything fed through io.Reader

Example

type User struct {
    Name  string `prep:"trim" validate:"required"`
    Email string `prep:"trim,lowercase"`
    Age   string
}

func main() {
    csvData := `name,email,age
  John Doe  ,JOHN@EXAMPLE.COM,30
Jane Smith,jane@example.com,25
`

    processor := fileprep.NewProcessor(fileprep.FileTypeCSV)
    var users []User

    reader, result, err := processor.Process(strings.NewReader(csvData), &users)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }

    fmt.Printf("Processed %d rows, %d valid\n", result.RowCount, result.ValidRowCount)

    for _, user := range users {
        fmt.Printf("Name: %q, Email: %q\n", user.Name, user.Email)
    }

    // reader can be passed directly to filesql
    _ = reader
}

Output:

Processed 2 rows, 2 valid
Name: "John Doe", Email: "john@example.com"
Name: "Jane Smith", Email: "jane@example.com"

fileframe

Features

Immutable DataFrame
Filtering, mapping, grouping
Ideal for “one-time transformations” on small/medium CSV/TSV datasets

Example

// Sample sales data
csvData := `product,amount,category
Apple,100,Fruit
Banana,150,Fruit
Carrot,80,Vegetable
Orange,120,Fruit
Broccoli,90,Vegetable`

df, err := fileframe.NewDataFrame(strings.NewReader(csvData), fileframe.CSV)
if err != nil {
    fmt.Println("Error:", err)
    return
}

fmt.Printf("Total rows: %d\n", df.Len())
fmt.Printf("Columns: %v\n", df.Columns())

// Filter
filtered := df.Filter(func(row map[string]any) bool {
    amount, ok := row["amount"].(int64)
    return ok && amount > 100
})
fmt.Printf("Rows with amount > 100: %d\n", filtered.Len())

// GroupBy + Sum
groupedDf, err := df.GroupBy("category")
if err != nil {
    fmt.Println("Error:", err)
    return
}
grouped, err := groupedDf.Sum("amount")
if err != nil {
    fmt.Println("Error:", err)
    return
}

for _, row := range grouped.ToRecords() {
    fmt.Printf("  %s: %.0f\n", row["category"], row["sum_amount"])
}

Output:

Total rows: 5
Columns: [product amount category]
Rows with amount > 100: 2
Fruit: 370
Vegetable: 170

filesql

filesql is not “SQL on files” in the literal sense.
Internally, it loads your data into a temporary SQLite database, giving you SQL capabilities without requiring you to manage the database yourself.

Example

func main() {
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    db, err := filesql.OpenContext(ctx, "data.csv")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()

    rows, err := db.QueryContext(ctx, "SELECT * FROM data WHERE age > 25")
    if err != nil {
        log.Fatal(err)
    }
    defer rows.Close()

    for rows.Next() {
        var name string
        var age int
        if err := rows.Scan(&name, &age); err != nil {
            log.Fatal(err)
        }
        fmt.Printf("Name: %s, Age: %d\n", name, age)
    }
}

If you're a Go developer who doesn’t need the full Apache or Python ETL ecosystems, these three lightweight libraries may fit your workflow well.

The user base is still small, so there may be undiscovered bugs — feedback and issue reports are very welcome.

DEV Community

A Minimal Go Toolkit for Cleaning, Validating, and Querying CSV/TSV/Excel/Parquet Files

fileprep

Features

Example

fileframe

Features

Example

filesql

Example

Top comments (0)