DEV Community

Cover image for Ruby CSV Parsing 5-6x Faster
sebyx07
sebyx07

Posted on

Ruby CSV Parsing 5-6x Faster

If you've ever had to parse large CSV files in Ruby, you know the pain. Ruby's built-in CSV library is convenient, but it's... not fast. I recently had to process some pretty large datasets and got tired of waiting, so I built something better.

The Problem

Ruby's CSV stdlib is pure Ruby. It works, it's battle-tested, but when you're parsing millions of rows, every millisecond adds up.

The Solution: zsv-ruby

I wrapped zsv, a SIMD-accelerated CSV parser written in C, into a Ruby gem. SIMD means it uses special CPU instructions to process multiple bytes at once - the same tech that makes video encoding and game physics fast.

gem install zsv
Enter fullscreen mode Exit fullscreen mode

That's it. The gem downloads and compiles zsv automatically during installation.

How Fast Are We Talking?

Here are some real benchmarks from my machine (Ruby 3.4.7):

Small file (1K rows):    6.2x faster
Medium file (10K rows):  5.3x faster
Large file (100K rows):  5.0x faster
Enter fullscreen mode Exit fullscreen mode

Not bad for a drop-in replacement.

Usage

If you've used Ruby's CSV, you already know how to use this:

require 'zsv'

# Parse a file
ZSV.foreach("data.csv") do |row|
  puts row.inspect
end

# With headers (returns hashes instead of arrays)
ZSV.foreach("users.csv", headers: true) do |row|
  puts row["email"]
end

# Parse a string
rows = ZSV.parse("name,age\nAlice,30\nBob,25")

# Read entire file
data = ZSV.read("data.csv")
Enter fullscreen mode Exit fullscreen mode

It also includes Enumerable, so you can do stuff like:

ZSV.open("users.csv", headers: true) do |parser|
  adults = parser.select { |row| row["age"].to_i >= 18 }
end
Enter fullscreen mode Exit fullscreen mode

The Technical Bits

For those curious about what's under the hood:

  • Native C extension - minimal overhead between Ruby and zsv
  • Streaming parser - doesn't load the whole file into memory
  • Proper GC integration - no memory leaks, plays nice with Ruby's garbage collector
  • Cross-platform - works on Linux and macOS (including Apple Silicon)

The trickiest part was bridging zsv's callback-based API (push model) with Ruby's pull-based API (shift, each). Had to implement a row buffer and deal with some fun GC edge cases.

When Should You Use This?

Use zsv when:

  • You're parsing large CSV files (10K+ rows)
  • Performance actually matters for your use case
  • You're already using Ruby's CSV and want a speed boost

Stick with CSV stdlib when:

  • You need CSV writing (zsv is read-only for now)
  • You're parsing tiny files where the overhead doesn't matter
  • You need some obscure CSV option that zsv doesn't support yet

Try It Out

gem install zsv
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/sebyx07/zsv-ruby
RubyGems: https://rubygems.org/gems/zsv

If you run into issues or have feature requests, open an issue. PRs welcome too.


Built this over a weekend with help from Claude Code. Sometimes the best solution is just wrapping a really good C library. 🤷

gem install zsv
Enter fullscreen mode Exit fullscreen mode

GitHub: https://github.com/sebyx07/zsv-ruby
RubyGems: https://rubygems.org/gems/zsv

If you run into issues or have feature requests, open an issue. PRs welcome too.


Built this over a weekend with help from Claude Code. Sometimes the best solution is just wrapping a really good C library. 🤷

Top comments (0)