Tilo Sloboda

Posted on Mar 16 • Edited on Jun 13

Switch from Ruby CSV to SmarterCSV in 5 Minutes

#ruby #csv #rails #programming

Why switch? → 10 Ways Ruby's CSV.read Can Silently Corrupt or Lose Your Data

In this article we'll explore how easy it is to switch from Ruby CSV to SmarterCSV — often just a single line change.

But we'll also go beyond the basics and look at advanced scenarios where SmarterCSV really shines: parallel processing with Sidekiq, streaming imports directly from S3, production-grade instrumentation, and resumable imports that survive deployments mid-file. These are patterns that Ruby's built-in CSV library can't handle without you building all the plumbing from scratch.

Here's how to make the switch in 5 minutes.

Ruby's built-in CSV library works — but it's slow, and its default output is arrays of arrays, where row data is disassociated from the headers. That means your code has to manually correlate values with column names, introducing risk and boilerplate. The result doesn't lend itself to direct use with ActiveRecord, Sidekiq, or any hash-based workflow — you're always required to do post-processing to get to usable data, re-implementing boilerplate code to clean up data.

SmarterCSV returns Rails-ready hashes with symbol keys, automatic numeric conversion, and whitespace stripping, using sensible defaults to clean up the data — all out of the box. No boilerplate code, and it does it up to 2×–9× faster than CSV.read while returning cleaned-up arrays of hashes.

How Much Faster? 🚀

SmarterCSV is designed for real-world CSV processing — the full pipeline including hash construction, key normalization, and type conversion, not just raw tokenization.

Comparison	Range
vs `CSV.read` (arrays only, no post-processing)	1.7×–8.6× faster
vs `CSV.table` (closest equivalent output)	7×–129× faster

Benchmarks: 19 CSV files (20k–80k rows), Ruby 3.4.7, Apple M1, SmarterCSV 1.16.0.

The CSV.table comparison is the fair one — both return symbol-keyed hashes. CSV.read returns raw arrays, so the post-processing work your application still needs to do is not included in that number, understating the real cost difference.

Step 1: Install

# Gemfile
gem 'smarter_csv'

bundle install

Step 2: The One-Line Switch

Most developers use CSV.read with headers: true, which returns an array of CSV::Row objects with string keys. To get usable hashes, you need to call .map(&:to_h) — and you still have string keys, no type conversion, and no whitespace stripping.

Consider this real-world CSV file — messy headers, extra columns without headers, a trailing comma:

$ cat data.csv
   First Name  , Last Name , Age
Alice , Smith,  30, VIP, Gold ,
Bob, Jones,  25

Before: Ruby CSV

rows = CSV.read('data.csv', headers: true).map(&:to_h)
rows.first
# => {"   First Name  " => "Alice ", " Last Name " => " Smith", " Age" => "  30", nil => ""}
#                                                                                   ^^^ "VIP" and "Gold" silently lost!

Whitespace-polluted keys, Age as a string, and every extra column competes for the same nil key — the last one wins, the rest are silently discarded.

After: SmarterCSV

rows = SmarterCSV.process('data.csv')
rows.first
# => {first_name: "Alice", last_name: "Smith", age: 30, column_1: "VIP", column_2: "Gold"}
#    trailing empty field dropped, no data loss

Clean symbol keys, whitespace stripped, age converted to Integer, extra columns named — no data loss.

That's it. No .map(&:to_h), no header_converters:, no manual post-processing.

Step 3: Know the Differences

SmarterCSV's defaults are designed for real-world use. Here's what changes and what to watch for:

String keys → Symbol keys

CSV.read returns string keys by default. SmarterCSV returns symbol keys, which are more efficient (much lower memory usage and faster), as well as idiomatic for Rails. If you genuinely need string keys, you could still add strings_as_keys: true.

Before: Ruby CSV

rows = CSV.read('data.csv', headers: true).map(&:to_h)
rows.first['name']   # => "Alice"

After: SmarterCSV

# symbol keys (the default)
rows = SmarterCSV.process('data.csv')
rows.first[:name]    # => "Alice"

# string keys — if you don't mind the memory impact
rows = SmarterCSV.process('data.csv', strings_as_keys: true)
rows.first['name']   # => "Alice"

Numeric conversion is automatic

CSV.read returns everything as strings - not ideal for consumption. SmarterCSV converts "42" → 42 and "3.14" → 3.14 automatically.

Watch out for columns where leading zeros matter — ZIP codes, phone numbers, account numbers - and exclude them:

Before: Ruby CSV

rows = CSV.read('data.csv', headers: true).map(&:to_h)
rows.first['age']    # => "30"  (string)

After: SmarterCSV

# numeric strings converted automatically
rows = SmarterCSV.process('data.csv')
rows.first[:age]     # => 30  (Integer)

# Exclude columns where leading zeros matter
rows = SmarterCSV.process('data.csv',
  convert_values_to_numeric: { except: [:zip_code, :phone, :account_number] })

Empty values are removed

SmarterCSV drops key/value pairs where the value is blank, and returns cleaned-up data hashes. CSV.read keeps them as nil.

Before: Ruby CSV

rows = CSV.read('sample.csv', headers: true, header_converters: :symbol).map(&:to_h)
rows[1]   # => { name: "Bob", age: "25", city: nil }

After: SmarterCSV

rows = SmarterCSV.process('sample.csv')
rows[1]   # => { name: "Bob", age: 25 }   ← :city dropped, :age converted

# To keep nil values (match CSV.read behaviour):
rows = SmarterCSV.process('sample.csv', remove_empty_values: false)
rows[1]   # => { name: "Bob", age: 25, city: nil }

Plain Hash, not CSV::Row

CSV.read returns CSV::Row objects — a wrapper around a hash with extra methods. SmarterCSV returns plain Ruby Hash objects, so there's no .to_h needed and no wrapper to unwrap.

Before: Ruby CSV

row = CSV.read('data.csv', headers: true).first
row.class        # => CSV::Row
row['name']      # => "Alice"   (string key)
row.to_h         # => {"name" => "Alice", "age" => "30"}  (still strings)

After: SmarterCSV

row = SmarterCSV.process('data.csv').first
row.class        # => Hash
row[:name]       # => "Alice"   (symbol key, no unwrapping needed)

Quick Reference

Ruby CSV	SmarterCSV	Notes
`CSV.read(f, headers: true).map(&:to_h)`	`SmarterCSV.process(f)`	Symbol keys, numeric conversion, whitespace stripped.
`CSV.read(f, headers: true, header_converters: :symbol).map(&:to_h)`	`SmarterCSV.process(f)`	Drop-in.
`CSV.table(f)`	`SmarterCSV.process(f)`	`CSV.table` returns a `CSV::Table` of `CSV::Row` objects; SmarterCSV returns a plain `Array` of `Hash`.
`CSV.parse(str, headers: true, header_converters: :symbol)`	`SmarterCSV.parse(str)`	Direct string parsing, new in 1.16.0.
`CSV.foreach(f, headers: true) { \	r\	}`
`converters: :numeric`	default	Automatic in SmarterCSV.
`converters: :date`	`value_converters: {col: DateConverter}`	Use explicit format strings — date formats are locale-dependent.
`liberal_parsing: true`	`on_bad_row: :collect`	Explicit quarantine gives you visibility.
`skip_blanks: true`	`remove_empty_hashes: true`	Default in SmarterCSV.
`row.to_h`	`row`	Already a plain Hash.
`row.headers`	`reader.headers`	Available on the `Reader` instance.

Beyond the Basics: What You Unlock

Once you're on SmarterCSV, these features come for free.

Batch processing for large files

SmarterCSV.process('big.csv', chunk_size: 500) do |chunk|
  MyModel.insert_all(chunk)   # bulk insert 500 rows at a time
end

Handle bad rows without crashing

good_rows = SmarterCSV.process('data.csv', on_bad_row: :collect)

puts "#{good_rows.size} imported, #{SmarterCSV.errors[:bad_row_count]} bad rows"
SmarterCSV.errors[:bad_rows].each { |r| puts "Line #{r[:file_line_number]}: #{r[:error_message]}" }

⚠️ Fibers: SmarterCSV.errors relies on Thread.current, which is shared across all
fibers in the same thread. If you process CSV concurrently in fibers (Async, Falcon,
manual Fiber scheduling), use SmarterCSV::Reader directly instead — its errors are
scoped to the instance.

Or use SmarterCSV::Reader directly when you also need access to headers or other reader state after processing:

reader = SmarterCSV::Reader.new('data.csv', on_bad_row: :collect)
good_rows = reader.process

puts "#{good_rows.size} imported, #{reader.errors[:bad_rows].size} bad rows"
reader.errors[:bad_rows].each { |r| puts "Line #{r[:file_line_number]}: #{r[:error_message]}" }

Sentinel values (NULL, N/A, #VALUE!)

rows = SmarterCSV.process('export.csv',
  nil_values_matching: /\A(NULL|N\/A|NaN|#VALUE!)\z/i)
# Matching values are nil-ified and removed automatically

Custom type converters

Date formats are locale-dependent, so SmarterCSV doesn't guess. You supply an explicit format:

require 'date'

rows = SmarterCSV.process('records.csv',
  value_converters: {
    birth_date: ->(v) { v ? Date.strptime(v, '%m/%d/%Y') : nil },
    price:      ->(v) { v&.delete('$,')&.to_f },
    active:     ->(v) { v&.match?(/\Atrue\z/i) },
  })

Row-by-row iteration with full Enumerable

Before: Ruby CSV

CSV.foreach('data.csv', headers: true, header_converters: :symbol) do |row|
  MyModel.create(row.to_h)
end

After: SmarterCSV

SmarterCSV.each('data.csv') do |row|
  MyModel.create(row)   # already a Hash
end

# Full Enumerable — filter, map, lazy
active_users = SmarterCSV.each('data.csv').select { |r| r[:status] == 'active' }
first_ten    = SmarterCSV.each('data.csv').lazy.first(10)

Rails file upload

Accepting a CSV upload in a Rails controller is straightforward — pass the tempfile path directly:

# app/controllers/imports_controller.rb
def create
  file = params[:file]   # ActionDispatch::Http::UploadedFile

  SmarterCSV.process(file.path, chunk_size: 500) do |chunk|
    MyModel.insert_all(chunk)
  end

  redirect_to root_path, notice: "Import complete"
end

No temp file management, no manual header parsing. The uploaded file is processed in streaming chunks, keeping memory usage low regardless of file size.

Renaming headers to match your database

CSV column names rarely match your ActiveRecord attribute names. Use key_mapping: to rename them in one step — the mapping uses the normalized (downcased, underscored) header name as input:

# CSV headers: "First Name", "Last Name", "E-Mail", "Date of Birth"
# After normalization:  :first_name, :last_name, :e_mail, :date_of_birth

rows = SmarterCSV.process('contacts.csv',
  key_mapping: {
    first_name:    :given_name,
    last_name:     :family_name,
    e_mail:        :email,
    date_of_birth: :dob,
  })
# => [{given_name: "Alice", family_name: "Smith", email: "alice@example.com", dob: "1990-05-14"}, ...]

Map a key to nil to drop that column entirely:

key_mapping: { internal_id: nil, created_at: nil }   # these columns won't appear in results

Select only the columns you need

Wide CSV files often have dozens of columns your application doesn't need. Use headers: { only: } to declare upfront which columns to keep — SmarterCSV skips everything else at the parser level, so unneeded fields are never allocated:

# CSV has 50 columns — you only need 3
rows = SmarterCSV.process('contacts.csv',
  headers: { only: [:email, :first_name, :last_name] })
# => [{email: "alice@example.com", first_name: "Alice", last_name: "Smith"}, ...]

Or exclude a known noisy column while keeping everything else:

rows = SmarterCSV.process('export.csv', headers: { except: [:internal_notes] })

Writing CSV from hashes

Before: Ruby CSV

# you manage headers manually, pass arrays
CSV.open('out.csv', 'w', write_headers: true, headers: ['name', 'age']) do |csv|
  csv << ['Alice', 30]
end

After: SmarterCSV

# pass hashes, headers discovered automatically
SmarterCSV.generate('out.csv') do |csv|
  csv << {name: 'Alice', age: 30}
  csv << {name: 'Bob',   age: 25}
end

Advanced Patterns: Where SmarterCSV Really Shines

These are scenarios where Ruby CSV falls short and SmarterCSV makes the solution clean and straightforward.

Parallel processing with Sidekiq

Each chunk is dispatched as an independent background job — the chunk API maps directly onto worker queues:

SmarterCSV.process('users.csv', chunk_size: 100) do |chunk, chunk_index|
  puts "Queueing chunk #{chunk_index} (#{chunk.size} records)..."
  Sidekiq::Client.push_bulk(
    'class' => UserImportWorker,
    'args'  => chunk,
  )
end
# => imports run in parallel across all your Sidekiq workers

Streaming directly from S3

SmarterCSV accepts any IO-like object — so you can stream a CSV directly from S3 without writing a temp file:

require 'aws-sdk-s3'

s3  = Aws::S3::Client.new(region: 'us-east-1')
obj = s3.get_object(bucket: 'my-bucket', key: 'imports/contacts.csv')

SmarterCSV::Reader.new(obj.body, chunk_size: 500).each_chunk do |chunk, _index|
  MyModel.insert_all(chunk)
end

No disk I/O, no temp files, no cleanup. The S3 response body streams directly into the parser.

Production instrumentation

on_start, on_chunk, and on_complete hooks give you full visibility into long-running imports — feed them into your logger, StatsD, or any metrics backend:

SmarterCSV.process('large_import.csv',
  chunk_size: 1_000,

  on_start: ->(info) {
    Rails.logger.info "Import started: #{info[:input]} (#{info[:file_size]} bytes)"
  },

  on_chunk: ->(info) {
    Rails.logger.debug "Chunk #{info[:chunk_number]}: #{info[:rows_in_chunk]} rows " \
                       "(#{info[:total_rows_so_far]} total so far)"
  },

  on_complete: ->(stats) {
    Rails.logger.info "Import complete: #{stats[:total_rows]} rows in #{stats[:duration].round(2)}s, " \
                      "#{stats[:bad_rows]} bad rows"
    StatsD.histogram('csv.import.duration', stats[:duration])
  },
) { |chunk| MyModel.insert_all(chunk) }

Resumable imports with Rails ActiveJob

Rails 8.1 introduced ActiveJob::Continuable — jobs that can pause mid-execution (on deployment or queue drain) and resume exactly where they stopped. SmarterCSV's chunk_index maps directly onto the job cursor:

class ImportCsvJob < ApplicationJob
  include ActiveJob::Continuable

  def perform(file_path)
    step :import_rows do |step|
      SmarterCSV.process(file_path, chunk_size: 500) do |chunk, chunk_index|
        next if chunk_index < step.cursor.to_i   # skip already-processed chunks on resume

        MyModel.insert_all(chunk)
        step.set! chunk_index + 1
      end
    end
  end
end

On interruption after chunk 7, Rails persists the cursor as 8. On the next run, chunks 0–7 are skipped instantly and processing resumes from chunk 8. Ruby CSV has no equivalent — you'd have to implement cursor tracking, row counting, and resume logic yourself.

Bulk upsert — insert or update

For recurring imports where records may already exist, use upsert_all with a unique key. SmarterCSV's hashes pass directly — no transformation needed:

SmarterCSV.process('contacts.csv',
  chunk_size: 500,
  key_mapping: { e_mail: :email },   # normalize header to match DB column
) do |chunk|
  Contact.upsert_all(chunk, unique_by: :email)
  # inserts new records, updates existing ones — all in one query per chunk
end

That's It - Enjoy! ✨

Install the gem, change one line, check the three behavior differences (numeric conversion, empty value removal, plain Hash vs CSV::Row), and you're done.

The rest — batch processing, bad row handling, value converters, column selection — is there when you need it.

gem 'smarter_csv'

GitHub: github.com/tilo/smarter_csv
Docs: Full documentation
RubyGems: rubygems.org/gems/smarter_csv

DEV Community

Switch from Ruby CSV to SmarterCSV in 5 Minutes

How Much Faster? 🚀

Step 1: Install

Step 2: The One-Line Switch

Step 3: Know the Differences

String keys → Symbol keys

Numeric conversion is automatic

Empty values are removed

Plain Hash, not CSV::Row

Quick Reference

Beyond the Basics: What You Unlock

Batch processing for large files

Handle bad rows without crashing

Sentinel values (NULL, N/A, #VALUE!)

Custom type converters

Row-by-row iteration with full Enumerable

Rails file upload

Renaming headers to match your database

Select only the columns you need

Writing CSV from hashes

Advanced Patterns: Where SmarterCSV Really Shines

Parallel processing with Sidekiq

Streaming directly from S3

Production instrumentation

Resumable imports with Rails ActiveJob

Bulk upsert — insert or update

That's It - Enjoy! ✨

Top comments (0)