Seryl Lns

Posted on Feb 24

Building a Rails Engine #5 — Designing a Target DSL

#ruby #rails #opensource #architecture

Designing a Target DSL

How to make each import type a single, self-describing Ruby class -- one file, zero boilerplate.

Context

This is part 5 of the series where we build DataPorter, a mountable Rails engine for data import workflows. In part 4, we modeled import records, errors, and reports using StoreModel and JSONB columns -- the data structures the engine operates on.

Now we need the layer that describes an import: what model does it target, what columns does it expect, how do CSV headers map to those columns? This is the Target DSL and the Registry that makes targets discoverable.

The problem

Before DataPorter, adding a new import type meant copying 200 lines from another controller action and hoping nothing breaks:

# Before: scattered across controller, service, and config
class GuestsController < ApplicationController
  def import
    file = params[:file]
    rows = CSV.parse(file.read, headers: true)
    errors = []
    imported = 0

    rows.each_with_index do |row, i|
      guest = Guest.new
      guest.first_name = row["Prenom"] || row["First Name"] || row["first_name"]
      guest.last_name  = row["Nom"]    || row["Last Name"]  || row["last_name"]
      guest.email      = row["Email"]  || row["email"]
      # ... 30 more lines of mapping, validation, dedup, error handling
      if guest.save
        imported += 1
      else
        errors << { row: i + 1, messages: guest.errors.full_messages }
      end
    end

    flash[:notice] = "#{imported} imported, #{errors.size} errors"
    redirect_to guests_path
  end
end

Every import type repeats this pattern with slightly different field names. Column mapping logic, error collection, validation -- all reimplemented from scratch. Adding a product import means copying the guest import and changing field names. Six months later, you have five imports with five subtly different error handling strategies.

We want a developer to open a single file, declare what their import looks like, and have the engine handle everything else:

# After: 15 lines, everything declared
class GuestTarget < DataPorter::Target
  label "Guests"
  model_name "Guest"
  columns do
    column :first_name, type: :string, required: true
    column :last_name,  type: :string, required: true
    column :email,      type: :email
  end

  csv_mapping do
    map "Prenom" => :first_name
    map "Nom"    => :last_name
  end

  def persist(record, context:)
    Guest.create!(record.data)
  end
end

No initializer wiring, no registration callbacks, no controller configuration.

How it fits together

Before diving into the code, here is how the three pieces connect:

Host App                            DataPorter Engine
─────────                           ─────────────────

GuestTarget                         Registry
  ├─ label, columns (DSL)     ──▶    register(:guests, GuestTarget)
  └─ persist (hook)                   │
                                      ▼
                                    Orchestrator (part 7)
                                      ├─ Registry.find(:guests)
                                      ├─ target._columns  (schema)
                                      └─ target.persist() (runtime)

                                    UI Controller
                                      └─ Registry.available (dropdown)

The Target declares what the import looks like. The Registry makes it discoverable. The Orchestrator (part 7) and the UI both query the Registry -- one to process imports, the other to populate the import type dropdown. The host app only writes Target classes; the engine handles everything else.

What we're building

Here is what a complete target definition looks like in the host app:

# app/data_porter/targets/guest_target.rb
class GuestTarget < DataPorter::Target
  label "Guests"
  model_name "Guest"
  icon "fas fa-users"
  sources :csv, :json

  columns do
    column :first_name, type: :string, required: true
    column :last_name,  type: :string, required: true
    column :email,      type: :email
  end

  csv_mapping do
    map "Prenom" => :first_name
    map "Nom"    => :last_name
  end

  deduplicate_by :email

  def persist(record, context:)
    Guest.create!(record.data)
  end
end

deduplicate_by :email tells the engine to check for existing records with the same email before inserting -- re-importing the same CSV won't create duplicates.

That is the entire file. The class-level DSL declares metadata and column schema. The instance method persist handles the actual write. The engine discovers this class through the Registry and wires it into the UI and orchestration layer automatically.

Implementation

Step 1 -- The Column struct

Before the Target itself, we need a value object for columns. Each column has a name, a type for validation, a required flag, a display label, and an open-ended options hash for type-specific settings like date formats.

# lib/data_porter/dsl/column.rb
module DataPorter
  module DSL
    Column = Struct.new(:name, :type, :required, :label, :options, keyword_init: true) do
      def initialize(name:, type: :string, required: false, label: nil, **options)
        super(
          name: name.to_sym,
          type: type.to_sym,
          required: required,
          label: label || name.to_s.humanize,
          options: options
        )
      end
    end
  end
end

A Struct gives us equality, to_h, members, and frozen-by-value semantics for free. The constructor coerces name and type to symbols so callers can pass strings or symbols without worrying. The label falls back to humanize -- one less thing to type for the common case, but overridable when the generated label doesn't fit (column :full_name, label: "Full Name"). The **options splat captures anything else (like format: "%d/%m/%Y" for dates) and tucks it into the options hash, keeping the struct's interface stable as we add type-specific features.

Step 2 -- The Target base class

The Target is where the DSL lives. All the declarative methods (label, model_name, columns, etc.) are class methods on a base class that host-app targets inherit from. Instance methods provide hook points for the import lifecycle.

# lib/data_porter/target.rb
module DataPorter
  class Target
    class << self
      attr_reader :_label, :_model_name, :_icon, :_sources,
                  :_columns, :_csv_mappings, :_dedup_keys

      def label(value)      = @_label = value
      def model_name(value) = @_model_name = value
      def icon(value)       = @_icon = value

      def sources(*types)
        @_sources = types.map(&:to_sym)
      end

      def columns(&)
        @_columns = []
        instance_eval(&)
      end

      def column(name, **)
        @_columns << DSL::Column.new(name: name, **)
      end

      def csv_mapping(&)
        @_csv_mappings = {}
        instance_eval(&)
      end

      def map(hash)
        @_csv_mappings.merge!(hash)
      end

      def deduplicate_by(*keys)
        @_dedup_keys = keys.map(&:to_sym)
      end
    end
  end
end

Every DSL method is a class method that stores its value in a class instance variable (@_label, not @@label). The underscore prefix is a convention to signal "this is DSL storage, not your public API." The columns block uses instance_eval to execute column calls in the class context, which gives us the nested block syntax without requiring the caller to reference self.

The instance-level hooks provide the extensibility points for the import lifecycle:

# lib/data_porter/target.rb (instance methods)
def transform(record)  = record
def validate(record)   = nil
def persist(_record, context:) = raise NotImplementedError
def after_import(_results, context:) = nil
def on_error(_record, _error, context:) = nil

transform and validate are no-ops by default -- override them if you need custom data munging or cross-field validation beyond type checking. persist raises NotImplementedError because every target must define how records get written. after_import and on_error are optional hooks for cleanup, notifications, or error recovery. The context: keyword argument carries the host app context (current user, tenant, etc.) that we set up in the configuration DSL back in part 3.

The split between class methods (declaration) and instance methods (execution) is deliberate. Class methods describe what the import is. Instance methods describe what happens during the import. The Orchestrator (part 7) will call target_class._columns to know the schema, then instantiate the target and call target.persist(record, context: ctx) for each row. Keeping these on different layers prevents the declaration phase from depending on runtime state.

Step 3 -- The Registry

Targets need to be discoverable. The engine's UI shows a dropdown of available import types; the Orchestrator looks up a target by key to process an import. The Registry is the central index.

# lib/data_porter/registry.rb
module DataPorter
  class TargetNotFound < Error; end

  module Registry
    @targets = {}

    class << self
      def register(key, klass)
        @targets[key.to_sym] = klass
      end

      def find(key)
        @targets.fetch(key.to_sym) do
          raise TargetNotFound, "Target '#{key}' not found"
        end
      end

      def available
        @targets.map do |key, klass|
          { key: key, label: klass._label, icon: klass._icon }
        end
      end

      def clear
        @targets = {}
      end
    end
  end
end

The Registry is a module with class-level state -- essentially a singleton hash. register adds a target class under a symbolic key. find retrieves it, raising a custom TargetNotFound error instead of a generic KeyError so the controller can rescue it with a proper 404. available returns lightweight summaries for the UI: just the key, label, and icon, without exposing the full class.

Registration happens in the host app's initializer:

# config/initializers/data_porter.rb
DataPorter::Registry.register(:guests, GuestTarget)
DataPorter::Registry.register(:products, ProductTarget)

We considered auto-discovery (scanning a directory for Target subclasses), but explicit registration is simpler to reason about: you can see exactly which targets are active, control ordering, and conditionally register based on environment or feature flags. The clear method supports testing -- wipe the registry in a before block so each spec starts clean.

Decisions & tradeoffs

Decision	We chose	Over	Because
DSL placement	Class methods on a base class	Instance methods or a configuration hash	Class methods read like declarations, not procedure calls. They execute once at load time, not per-import
State storage	Class instance variables (`@_label`)	`class_attribute` from ActiveSupport	Class instance variables don't leak to subclasses by default, avoiding surprising inheritance behavior. We don't need the per-instance override that `class_attribute` provides
Column definition	`Struct` with keyword init	Plain hash or full ActiveModel class	Struct gives us typed attributes, equality, and `to_h` with no dependencies. A hash would lose the interface; ActiveModel would be overkill for a value object
Hook pattern	Instance methods with default no-ops	Event system or callback chain	Override-and-call is the simplest extension model. No subscription management, no ordering concerns. If you need it, override it
Registry	Explicit `register` calls	Auto-discovery via `inherited` hook or directory scanning	Explicit registration is visible, testable, and doesn't depend on load order or file system conventions. Auto-discovery can be added later as sugar on top

Testing it

Target specs verify both the DSL declarations and the default hook behavior:

# spec/data_porter/target_spec.rb
let(:target_class) do
  Class.new(DataPorter::Target) do
    label "Guests"
    model_name "Guest"
    sources :csv, :json

    columns do
      column :first_name, type: :string, required: true
      column :email,      type: :email
    end

    csv_mapping do
      map "Prenom" => :first_name
    end
  end
end

it "sets the label" do
  expect(target_class._label).to eq("Guests")
end

it "defines columns" do
  expect(target_class._columns.size).to eq(2)
end

it "persist raises NotImplementedError" do
  expect { target_class.new.persist(nil, context: nil) }
    .to raise_error(NotImplementedError)
end

Registry specs confirm lookup, error handling, and the available summary:

# spec/data_porter/registry_spec.rb
before { described_class.clear }

it "stores a target by key" do
  described_class.register(:guests, target_class)
  expect(described_class.find(:guests)).to eq(target_class)
end

it "raises TargetNotFound for unknown keys" do
  expect { described_class.find(:unknown) }
    .to raise_error(DataPorter::TargetNotFound)
end

it "returns target summaries" do
  described_class.register(:guests, target_class)
  result = described_class.available
  expect(result).to contain_exactly(
    { key: :guests, label: "Guests", icon: "fas fa-users" }
  )
end

Both suites run without a database. Anonymous classes (Class.new(DataPorter::Target)) let us define fresh targets per test without polluting the class hierarchy.

Recap

The Target base class uses class methods as a declarative DSL: label, model_name, columns, csv_mapping, and deduplicate_by. Each stores its value in a class instance variable, keeping subclasses isolated.
The Column struct is a lightweight value object that captures name, type, required flag, display label, and open-ended options. Struct gives us equality and to_h for free.
Instance method hooks (transform, validate, persist, after_import, on_error) separate runtime behavior from static declaration. Only persist is mandatory; the rest default to no-ops.
The Registry is an explicit registration system that maps symbolic keys to target classes, providing lookup for the Orchestrator and summaries for the UI.

Next up

We have data models (part 4) and a target DSL (this part) that describes what each import expects. In part 6, we will wire them together with the Source layer -- starting with CSV parsing, ActiveStorage file handling, and automatic column mapping from CSV headers to target columns. That is where the first end-to-end flow comes together: upload a file, parse it, and see structured records.

This is part 5 of the series "Building DataPorter - A Data Import Engine for Rails". Previous: Modeling import data with StoreModel & JSONB | Next: Parsing CSV data with Sources

GitHub: SerylLns/data_porter | RubyGems: data_porter