If you've ever had to parse large CSV files in Ruby, you know the pain. Ruby's built-in CSV library is convenient, but it's... not fast. I recently had to process some pretty large datasets and got tired of waiting, so I built something better.
The Problem
Ruby's CSV stdlib is pure Ruby. It works, it's battle-tested, but when you're parsing millions of rows, every millisecond adds up.
The Solution: zsv-ruby
I wrapped zsv, a SIMD-accelerated CSV parser written in C, into a Ruby gem. SIMD means it uses special CPU instructions to process multiple bytes at once - the same tech that makes video encoding and game physics fast.
gem install zsv
That's it. The gem downloads and compiles zsv automatically during installation.
How Fast Are We Talking?
Here are some real benchmarks from my machine (Ruby 3.4.7):
Small file (1K rows): 6.2x faster
Medium file (10K rows): 5.3x faster
Large file (100K rows): 5.0x faster
Not bad for a drop-in replacement.
Usage
If you've used Ruby's CSV, you already know how to use this:
require 'zsv'
# Parse a file
ZSV.foreach("data.csv") do |row|
puts row.inspect
end
# With headers (returns hashes instead of arrays)
ZSV.foreach("users.csv", headers: true) do |row|
puts row["email"]
end
# Parse a string
rows = ZSV.parse("name,age\nAlice,30\nBob,25")
# Read entire file
data = ZSV.read("data.csv")
It also includes Enumerable, so you can do stuff like:
ZSV.open("users.csv", headers: true) do |parser|
adults = parser.select { |row| row["age"].to_i >= 18 }
end
The Technical Bits
For those curious about what's under the hood:
- Native C extension - minimal overhead between Ruby and zsv
- Streaming parser - doesn't load the whole file into memory
- Proper GC integration - no memory leaks, plays nice with Ruby's garbage collector
- Cross-platform - works on Linux and macOS (including Apple Silicon)
The trickiest part was bridging zsv's callback-based API (push model) with Ruby's pull-based API (shift, each). Had to implement a row buffer and deal with some fun GC edge cases.
When Should You Use This?
Use zsv when:
- You're parsing large CSV files (10K+ rows)
- Performance actually matters for your use case
- You're already using Ruby's CSV and want a speed boost
Stick with CSV stdlib when:
- You need CSV writing (zsv is read-only for now)
- You're parsing tiny files where the overhead doesn't matter
- You need some obscure CSV option that zsv doesn't support yet
Try It Out
gem install zsv
GitHub: https://github.com/sebyx07/zsv-ruby
RubyGems: https://rubygems.org/gems/zsv
If you run into issues or have feature requests, open an issue. PRs welcome too.
Built this over a weekend with help from Claude Code. Sometimes the best solution is just wrapping a really good C library. 🤷
gem install zsv
GitHub: https://github.com/sebyx07/zsv-ruby
RubyGems: https://rubygems.org/gems/zsv
If you run into issues or have feature requests, open an issue. PRs welcome too.
Built this over a weekend with help from Claude Code. Sometimes the best solution is just wrapping a really good C library. 🤷
Top comments (0)