Every non-trivial Rails app needs reliable data imports. DataPorter is a mountable Rails engine that provides upload, preview, mapping templates, dry-run and background import — so teams stop rebuilding the same fragile workflow again and again.
TL;DR: I’m building DataPorter — a Rails engine that turns data imports into a first-class workflow (upload → preview → import) instead of a pile of ad-hoc scripts.
Why build a data import engine?
“On Monday we got a CSV export from the marketing team — the import broke, 1,200 rows failed and we spent the day firefighting.”
Sound familiar? This is why I started DataPorter.
The problem
If you've worked on any non-trivial Rails application, you've probably written this code more than once:
- Upload a CSV (or fetch data from an API)
- Parse and validate each row
- Show the user what's about to be imported
- Persist the valid records to the database
Maybe it was a guest list for a hotel app. Maybe vendor data for an e-commerce platform. Maybe scraped listings from an external API.
The specifics change, but the workflow is always the same. And every time, we rebuild it from scratch: a controller action here, some CSV parsing there, a background job, maybe a progress bar if we're feeling fancy.
The result? Scattered import logic across controllers, services, and jobs. No consistency. No reuse. Every new import type means rewriting the same infrastructure.
What we're building
DataPorter is a mountable Rails engine that owns the import infrastructure.
The host app only declares what to import and how to persist it.
One example target:
# app/importers/guests_target.rb
class GuestsTarget < DataPorter::Target
label "Guests"
model Guest
sources :csv, :json
columns do
column :first_name, type: :string, required: true
column :last_name, type: :string, required: true
column :email, type: :email
column :phone, type: :phone
end
def persist(record, context:)
Guest.create!(hotel: context.hotel, **record.attributes)
end
end
That's it. DataPorter handles the rest: file upload, parsing, validation, preview UI, progress tracking, error reporting, and background processing.
The workflow three steps:
Upload → Map columns → Preview → Dry-run → Import
Why not use what already exists?
There are existing solutions in the Rails ecosystem. Let's look at the two most common approaches.
The DIY approach
Most teams build custom import flows per model. It works, but it doesn't scale. By the third import type, you're copy-pasting controller actions and wishing you had abstracted earlier.
maintenance_tasks
Shopify's maintenance_tasks gem is excellent for one-off data processing scripts. It provides a UI, background processing, and CSV support.
But it solves a different problem. It's designed for fire-and-forget maintenance operations, not interactive import workflows.
| Aspect | maintenance_tasks | DataPorter |
|---|---|---|
| Purpose | One-off scripts | Import workflows |
| Preview before import | No | Yes |
| Visual validation | No | Yes (complete/partial/missing) |
| Multi-step workflow | No (fire & forget) | Yes (parse -> preview -> import) |
| Real-time progress | No | Yes (ActionCable) |
| Data sources | CSV, ActiveRecord | CSV, JSON, API (extensible) |
| Auto-generated UI | Parameter form | Dynamic column table |
The key difference: DataPorter adds a human validation step between parsing and persisting. The user sees exactly what will be imported, with clear status indicators for each row, before anything touches the database.
Architecture overview
DataPorter is split into two clear layers:
┌─────────────────────────────────────┐
│ DataPorter (the gem) │
│ │
│ Engine, Model, State Machine, │
│ Sources, Orchestrator, Jobs, │
│ ActionCable, UI, DSL, Registry, │
│ Generators │
└──────────────┬──────────────────────┘
│ mount + configure + define targets
┌──────────────┴──────────────────────┐
│ Host App │
│ │
│ Initializer, Target files, │
│ Auth (parent controller), │
│ Style overrides (optional) │
└─────────────────────────────────────┘
The gem owns the infrastructure. The host app owns the business logic. This separation is the core design principle we'll follow throughout the series.
The tech stack
Here's what we'll use and why:
| Dependency | Role | Why |
|---|---|---|
| store_model | Typed JSONB attributes | Store import records as structured data without extra tables |
| phlex | View components | Ruby-native views, easier to test and namespace than ERB |
| turbo-rails | Page updates | Turbo Frames for partial reloads during the import flow |
| stimulus | JS behavior | Progress bar updates via ActionCable |
| Tailwind CSS | Styling | Scoped with dp- prefix to avoid host app conflicts |
We'll also rely heavily on Rails built-ins: ActiveJob for background processing, ActionCable for real-time updates, ActiveStorage for file uploads, and enum-based state machine for the import lifecycle.
What this series covers
Here's the roadmap — each part is a standalone article:
- Why build a data import engine? (this article)
- Scaffolding a Rails Engine gem — gem structure, Engine setup
- Configuration DSL — making the gem flexible
- StoreModel & JSONB — modeling import data without extra tables
- Target DSL — one file = one import type
- CSV parsing with Sources — the first end-to-end flow
- The Orchestrator — coordinating parse and import
- ActionCable & Stimulus — real-time progress
- Phlex & Tailwind UI — auto-generated preview tables
- Controllers & routing — engine controllers done right
- Generators — install in one command
- JSON & API sources — beyond CSV
- Testing a Rails Engine — specs for an isolated engine
- Dry Run mode — validate against the database before importing
- Publishing & retrospective — from repo to rubygems.org
x. Maybe a part 2 with more advanced features
Recap
- Data import is a recurring pattern in Rails apps that deserves a reusable solution
- DataPorter provides the infrastructure (upload, parse, preview, import) while the host app defines the business logic
- The 3-step workflow with human validation is what sets it apart from existing tools
- We're building a proper mountable Rails engine with a clean separation between gem and host app
Next up
In the next article, we'll run bundle gem data_porter, set up the Rails Engine with isolate_namespace, structure our directories, and configure the gemspec with our dependencies. We'll make our first architectural decisions — and explain why they matter.
This is part 1 of the series "Building DataPorter - A Data Import Engine for Rails". Next: Scaffolding a Rails Engine gem
If you liked this intro and want to follow the series:
- 💎 Try it : data_porter
- Follow me on dev.to for the next parts
- Open an issue if you want a specific use-case covered
- 🌟 If you find this useful, star the repo — it helps validate the project and keep it moving
GitHub: SerylLns/data_porter | RubyGems: data_porter

Top comments (0)