DEV Community

Cover image for Building a Rails Engine #1 — Why Build a Data Import Engine?
Seryl Lns
Seryl Lns

Posted on • Edited on

Building a Rails Engine #1 — Why Build a Data Import Engine?

Every non-trivial Rails app needs reliable data imports. DataPorter is a mountable Rails engine that provides upload, preview, mapping templates, dry-run and background import — so teams stop rebuilding the same fragile workflow again and again.


TL;DR: I’m building DataPorter — a Rails engine that turns data imports into a first-class workflow (upload → preview → import) instead of a pile of ad-hoc scripts.

Why build a data import engine?

“On Monday we got a CSV export from the marketing team — the import broke, 1,200 rows failed and we spent the day firefighting.”

Sound familiar? This is why I started DataPorter.


The problem

If you've worked on any non-trivial Rails application, you've probably written this code more than once:

  1. Upload a CSV (or fetch data from an API)
  2. Parse and validate each row
  3. Show the user what's about to be imported
  4. Persist the valid records to the database

Maybe it was a guest list for a hotel app. Maybe vendor data for an e-commerce platform. Maybe scraped listings from an external API.

The specifics change, but the workflow is always the same. And every time, we rebuild it from scratch: a controller action here, some CSV parsing there, a background job, maybe a progress bar if we're feeling fancy.

The result? Scattered import logic across controllers, services, and jobs. No consistency. No reuse. Every new import type means rewriting the same infrastructure.


What we're building

DataPorter is a mountable Rails engine that owns the import infrastructure.

The host app only declares what to import and how to persist it.

One example target:

# app/importers/guests_target.rb
class GuestsTarget < DataPorter::Target
  label "Guests"
  model Guest
  sources :csv, :json

  columns do
    column :first_name, type: :string, required: true
    column :last_name,  type: :string, required: true
    column :email,      type: :email
    column :phone,      type: :phone
  end

  def persist(record, context:)
    Guest.create!(hotel: context.hotel, **record.attributes)
  end
end
Enter fullscreen mode Exit fullscreen mode

That's it. DataPorter handles the rest: file upload, parsing, validation, preview UI, progress tracking, error reporting, and background processing.

The workflow three steps:

Upload → Map columns → Preview → Dry-run → Import
Enter fullscreen mode Exit fullscreen mode

Why not use what already exists?

There are existing solutions in the Rails ecosystem. Let's look at the two most common approaches.

The DIY approach

Most teams build custom import flows per model. It works, but it doesn't scale. By the third import type, you're copy-pasting controller actions and wishing you had abstracted earlier.

maintenance_tasks

Shopify's maintenance_tasks gem is excellent for one-off data processing scripts. It provides a UI, background processing, and CSV support.

But it solves a different problem. It's designed for fire-and-forget maintenance operations, not interactive import workflows.

Aspect maintenance_tasks DataPorter
Purpose One-off scripts Import workflows
Preview before import No Yes
Visual validation No Yes (complete/partial/missing)
Multi-step workflow No (fire & forget) Yes (parse -> preview -> import)
Real-time progress No Yes (ActionCable)
Data sources CSV, ActiveRecord CSV, JSON, API (extensible)
Auto-generated UI Parameter form Dynamic column table

The key difference: DataPorter adds a human validation step between parsing and persisting. The user sees exactly what will be imported, with clear status indicators for each row, before anything touches the database.


Architecture overview

DataPorter is split into two clear layers:

┌─────────────────────────────────────┐
│  DataPorter (the gem)               │
│                                     │
│  Engine, Model, State Machine,      │
│  Sources, Orchestrator, Jobs,       │
│  ActionCable, UI, DSL, Registry,    │
│  Generators                         │
└──────────────┬──────────────────────┘
               │ mount + configure + define targets
┌──────────────┴──────────────────────┐
│  Host App                           │
│                                     │
│  Initializer, Target files,         │
│  Auth (parent controller),          │
│  Style overrides (optional)         │
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The gem owns the infrastructure. The host app owns the business logic. This separation is the core design principle we'll follow throughout the series.


The tech stack

Here's what we'll use and why:

Dependency Role Why
store_model Typed JSONB attributes Store import records as structured data without extra tables
phlex View components Ruby-native views, easier to test and namespace than ERB
turbo-rails Page updates Turbo Frames for partial reloads during the import flow
stimulus JS behavior Progress bar updates via ActionCable
Tailwind CSS Styling Scoped with dp- prefix to avoid host app conflicts

We'll also rely heavily on Rails built-ins: ActiveJob for background processing, ActionCable for real-time updates, ActiveStorage for file uploads, and enum-based state machine for the import lifecycle.


What this series covers

Here's the roadmap — each part is a standalone article:

  1. Why build a data import engine? (this article)
  2. Scaffolding a Rails Engine gem — gem structure, Engine setup
  3. Configuration DSL — making the gem flexible
  4. StoreModel & JSONB — modeling import data without extra tables
  5. Target DSL — one file = one import type
  6. CSV parsing with Sources — the first end-to-end flow
  7. The Orchestrator — coordinating parse and import
  8. ActionCable & Stimulus — real-time progress
  9. Phlex & Tailwind UI — auto-generated preview tables
  10. Controllers & routing — engine controllers done right
  11. Generators — install in one command
  12. JSON & API sources — beyond CSV
  13. Testing a Rails Engine — specs for an isolated engine
  14. Dry Run mode — validate against the database before importing
  15. Publishing & retrospective — from repo to rubygems.org

x. Maybe a part 2 with more advanced features


Recap

  • Data import is a recurring pattern in Rails apps that deserves a reusable solution
  • DataPorter provides the infrastructure (upload, parse, preview, import) while the host app defines the business logic
  • The 3-step workflow with human validation is what sets it apart from existing tools
  • We're building a proper mountable Rails engine with a clean separation between gem and host app

Next up

In the next article, we'll run bundle gem data_porter, set up the Rails Engine with isolate_namespace, structure our directories, and configure the gemspec with our dependencies. We'll make our first architectural decisions — and explain why they matter.


This is part 1 of the series "Building DataPorter - A Data Import Engine for Rails". Next: Scaffolding a Rails Engine gem

If you liked this intro and want to follow the series:

  • 💎 Try it : data_porter
  • Follow me on dev.to for the next parts
  • Open an issue if you want a specific use-case covered
  • 🌟 If you find this useful, star the repo — it helps validate the project and keep it moving

GitHub: SerylLns/data_porter | RubyGems: data_porter

Top comments (0)