Building a Rails Engine #1 — Why Build a Data Import Engine?

#ruby #rails #opensource #architecture

Every non-trivial Rails app needs reliable data imports. DataPorter is a mountable Rails engine that provides upload, preview, mapping templates, dry-run and background import — so teams stop rebuilding the same fragile workflow again and again.

TL;DR: I’m building DataPorter — a Rails engine that turns data imports into a first-class workflow (upload → preview → import) instead of a pile of ad-hoc scripts.

Why build a data import engine?

“On Monday we got a CSV export from the marketing team — the import broke, 1,200 rows failed and we spent the day firefighting.”

Sound familiar? This is why I started DataPorter.

The problem

If you've worked on any non-trivial Rails application, you've probably written this code more than once:

Upload a CSV (or fetch data from an API)
Parse and validate each row
Show the user what's about to be imported
Persist the valid records to the database

Maybe it was a guest list for a hotel app. Maybe vendor data for an e-commerce platform. Maybe scraped listings from an external API.

The specifics change, but the workflow is always the same. And every time, we rebuild it from scratch: a controller action here, some CSV parsing there, a background job, maybe a progress bar if we're feeling fancy.

The result? Scattered import logic across controllers, services, and jobs. No consistency. No reuse. Every new import type means rewriting the same infrastructure.

What we're building

DataPorter is a mountable Rails engine that owns the import infrastructure.

The host app only declares what to import and how to persist it.

One example target:

# app/importers/guests_target.rb
class GuestsTarget < DataPorter::Target
  label "Guests"
  model Guest
  sources :csv, :json

  columns do
    column :first_name, type: :string, required: true
    column :last_name,  type: :string, required: true
    column :email,      type: :email
    column :phone,      type: :phone
  end

  def persist(record, context:)
    Guest.create!(hotel: context.hotel, **record.attributes)
  end
end

That's it. DataPorter handles the rest: file upload, parsing, validation, preview UI, progress tracking, error reporting, and background processing.

The workflow three steps:

Upload → Map columns → Preview → Dry-run → Import

Why not use what already exists?

There are existing solutions in the Rails ecosystem. Let's look at the two most common approaches.

The DIY approach

Most teams build custom import flows per model. It works, but it doesn't scale. By the third import type, you're copy-pasting controller actions and wishing you had abstracted earlier.

maintenance_tasks

Shopify's maintenance_tasks gem is excellent for one-off data processing scripts. It provides a UI, background processing, and CSV support.

But it solves a different problem. It's designed for fire-and-forget maintenance operations, not interactive import workflows.

Aspect	maintenance_tasks	DataPorter
Purpose	One-off scripts	Import workflows
Preview before import	No	Yes
Visual validation	No	Yes (complete/partial/missing)
Multi-step workflow	No (fire & forget)	Yes (parse -> preview -> import)
Real-time progress	No	Yes (ActionCable)
Data sources	CSV, ActiveRecord	CSV, JSON, API (extensible)
Auto-generated UI	Parameter form	Dynamic column table

The key difference: DataPorter adds a human validation step between parsing and persisting. The user sees exactly what will be imported, with clear status indicators for each row, before anything touches the database.

Architecture overview

DataPorter is split into two clear layers:

┌─────────────────────────────────────┐
│  DataPorter (the gem)               │
│                                     │
│  Engine, Model, State Machine,      │
│  Sources, Orchestrator, Jobs,       │
│  ActionCable, UI, DSL, Registry,    │
│  Generators                         │
└──────────────┬──────────────────────┘
               │ mount + configure + define targets
┌──────────────┴──────────────────────┐
│  Host App                           │
│                                     │
│  Initializer, Target files,         │
│  Auth (parent controller),          │
│  Style overrides (optional)         │
└─────────────────────────────────────┘

The gem owns the infrastructure. The host app owns the business logic. This separation is the core design principle we'll follow throughout the series.

The tech stack

Here's what we'll use and why:

Dependency	Role	Why
store_model	Typed JSONB attributes	Store import records as structured data without extra tables
phlex	View components	Ruby-native views, easier to test and namespace than ERB
turbo-rails	Page updates	Turbo Frames for partial reloads during the import flow
stimulus	JS behavior	Progress bar updates via ActionCable
Tailwind CSS	Styling	Scoped with `dp-` prefix to avoid host app conflicts

We'll also rely heavily on Rails built-ins: ActiveJob for background processing, ActionCable for real-time updates, ActiveStorage for file uploads, and enum-based state machine for the import lifecycle.

What this series covers

Here's the roadmap — each part is a standalone article:

Why build a data import engine? (this article)
Scaffolding a Rails Engine gem — gem structure, Engine setup
Configuration DSL — making the gem flexible
StoreModel & JSONB — modeling import data without extra tables
Target DSL — one file = one import type
CSV parsing with Sources — the first end-to-end flow
The Orchestrator — coordinating parse and import
ActionCable & Stimulus — real-time progress
Phlex & Tailwind UI — auto-generated preview tables
Controllers & routing — engine controllers done right
Generators — install in one command
JSON & API sources — beyond CSV
Testing a Rails Engine — specs for an isolated engine
Dry Run mode — validate against the database before importing
Publishing & retrospective — from repo to rubygems.org

x. Maybe a part 2 with more advanced features

Recap

Data import is a recurring pattern in Rails apps that deserves a reusable solution
DataPorter provides the infrastructure (upload, parse, preview, import) while the host app defines the business logic
The 3-step workflow with human validation is what sets it apart from existing tools
We're building a proper mountable Rails engine with a clean separation between gem and host app

Next up

In the next article, we'll run bundle gem data_porter, set up the Rails Engine with isolate_namespace, structure our directories, and configure the gemspec with our dependencies. We'll make our first architectural decisions — and explain why they matter.

This is part 1 of the series "Building DataPorter - A Data Import Engine for Rails". Next: Scaffolding a Rails Engine gem

If you liked this intro and want to follow the series:

💎 Try it : data_porter
Follow me on dev.to for the next parts
Open an issue if you want a specific use-case covered
🌟 If you find this useful, star the repo — it helps validate the project and keep it moving

GitHub: SerylLns/data_porter | RubyGems: data_porter