Publishing to RubyGems & Retrospective
From
bundle gemtogem push: looking back at 14 articles, 20 components, and the lessons learned building a Rails engine from scratch with TDD.
Context
This is the final article in the series where we build DataPorter, a mountable Rails engine for data import workflows. In part 14, we added Dry Run mode -- the last safety net before data touches the database.
We started this series with a question: why do we keep rebuilding the same import workflow in every Rails app? Fourteen articles later, we have a published gem that answers it. This article covers the last mile -- publishing to RubyGems -- then steps back to look at what we built, what we learned, and what we would do differently.
Publishing the gem
The gemspec
The interesting parts of the gemspec are not the metadata -- they are the constraints:
# data_porter.gemspec
Gem::Specification.new do |spec|
spec.name = "data_porter"
spec.version = DataPorter::VERSION
spec.required_ruby_version = ">= 3.2.0"
spec.metadata["rubygems_mfa_required"] = "true"
spec.add_dependency "csv"
spec.add_dependency "phlex", ">= 1.0"
spec.add_dependency "rails", ">= 7.0"
spec.add_dependency "store_model", ">= 2.0"
spec.add_dependency "turbo-rails", ">= 1.0"
end
rubygems_mfa_required enforces multi-factor authentication for publishing -- a standard for any serious open-source gem. required_ruby_version at >= 3.2.0 excludes unmaintained Ruby versions. Runtime dependencies are intentionally wide (>= 1.0, >= 7.0) to avoid locking host apps to specific versions.
The spec.files filter excludes dev files (spec/, bin/, .github/) so the published gem only contains production code. Nobody wants to download 2 MB of specs when installing a gem.
Versioning
DataPorter follows semantic versioning:
-
0.1.0: first release. The
0.xsignals that the API may still evolve. - 0.x.y: each new feature increments minor, each bugfix increments patch.
- 1.0.0: comes when the API is stabilized and battle-tested in production across multiple apps.
The version number lives in a single file:
# lib/data_porter/version.rb
module DataPorter
VERSION = "0.1.0"
end
One place to update. The gemspec reads it via require_relative. The CHANGELOG references it. The Git tag matches it. No duplication.
The release workflow
# 1. Update version
# lib/data_porter/version.rb -> VERSION = "0.1.0"
# 2. Update CHANGELOG
# CHANGELOG.md -> ## [0.1.0] - 2026-02-06
# 3. Commit, tag, push
git add -A && git commit -m "Release v0.1.0"
git tag v0.1.0
git push origin master --tags
# 4. Build and push
gem build data_porter.gemspec
gem push data_porter-0.1.0.gem
Or, if the Rakefile includes bundler/gem_tasks:
bundle exec rake release
This single command builds, tags, pushes to Git, and pushes to RubyGems -- guaranteeing the tag and the gem stay in sync.
Documentation
A gem without documentation is a gem nobody will use. DataPorter relies on three layers:
The README: entry point. Install in one command (rails generate data_porter:install), a 15-line Target example, the three-step workflow diagram. A developer should understand what the gem does and install it in under 5 minutes.
The CHANGELOG: every release documented with what changed, what was added, what broke. Keep a Changelog format -- a standard the Ruby community knows.
Inline comments: every public method documented with YARD. The DSL is the critical part -- column, sources, csv_mapping, persist need examples, because that is what users will read most.
What we built
Here is the complete list of components that make up DataPorter, in the order we built them:
| # | Component | Role |
|---|---|---|
| 1 | Engine + isolate_namespace | Gem structure, namespace isolation |
| 2 | Configuration DSL |
DataPorter.configure, defaults, context_builder
|
| 3 | StoreModels (ImportRecord, Error, Report) | Typed JSONB structures without extra tables |
| 4 | TypeValidator | Type validation (email, phone, url) on columns |
| 5 | Target DSL |
label, model, columns, sources, persist
|
| 6 | Registry | Auto-discovery and resolution of targets |
| 7 | Source::Base + Source::CSV | Source abstraction, CSV parsing with mapping |
| 8 | DataImport model | ActiveRecord, enum status, polymorphic user |
| 9 | Orchestrator | Coordinates parse/import, per-record error handling |
| 10 | RecordValidator | Generic validations (required, type) |
| 11 | ParseJob + ImportJob | Background processing via ActiveJob |
| 12 | Broadcaster + ImportChannel | Real-time progress via ActionCable |
| 13 | 6 Phlex components | StatusBadge, SummaryCards, PreviewTable, ProgressBar, ResultsSummary, FailureAlert |
| 14 | Stimulus controller | Client-side progress bar animation |
| 15 | ImportsController | Dynamic inheritance, 7 actions, Turbo integration |
| 16 | Install generator | Migration, initializer, routes, importers directory |
| 17 | Target generator | Target scaffolding with column parsing |
| 18 | Source::JSON | Import from JSON file or raw text |
| 19 | Source::API | Import from HTTP endpoint with auth and params |
| 20 | Dry Run | Transaction + rollback, enriches records with DB errors |
Twenty components. Each with its specs. Each with an article explaining why it exists and how it works.
Lessons learned
TDD without a dummy app
The most consequential decision of the series: testing the engine without creating a Rails application in spec/dummy/. A 60-line spec_helper.rb that bootstraps in-memory SQLite, configures load paths, and stubs ApplicationController. It works, and it works well -- the full suite runs in under a second.
The unexpected benefit: this constraint forces every component to stay decoupled. If a component needs a router to be tested, that is a signal it is too tightly coupled to the framework. Structural controller tests (verifying inheritance, callbacks, method signatures) felt strange at first. In hindsight, they test exactly what the gem owns -- the wiring -- and leave integration testing to the host app.
The trap to avoid: duplication between the schema in spec_helper.rb and the migration template. If the two diverge, tests pass but the generated migration does not match what was tested.
StoreModel gotchas
StoreModel is powerful, but it has its subtleties:
Dirty tracking: when you modify an object inside a store_model attribute, ActiveRecord does not detect the change. You can set data_import.records.first.status = "complete" and call save -- nothing gets persisted. The fix: call records_will_change! before modifying, or reassign the entire attribute.
Serialization round-trip: symbol keys become string keys after save/reload. { name: "Alice" } comes back as { "name" => "Alice" }. You need to know this and code accordingly -- either always use string keys, or call symbolize_keys on the way out. DataPorter does the latter in ImportRecord#attributes.
SQLite vs PostgreSQL: in tests, StoreModel columns are text. In production, they are jsonb. StoreModel handles the difference transparently, but certain JSONB queries (indexes, contains) cannot be tested in SQLite. An acceptable tradeoff for the speed of the feedback loop.
Phlex in an engine: plain vs text
A Phlex-specific trap: to emit raw text inside an element, you must use plain (not text). In earlier Phlex versions, text existed but was renamed. If you use text with a recent version, you get a cryptic NoMethodError.
The other subtlety: calling super() in every component's initialize. Phlex requires it, and forgetting it produces silent errors or empty renders.
Testing patterns: controllers, channels, JS
Testing JavaScript from Ruby by reading the file as text and asserting on strings -- it sounds hacky. In practice, it catches the most common bug class in an engine: misalignment between Ruby and JS code. The channel is called DataPorter::ImportChannel in Ruby and "DataPorter::ImportChannel" in JS. If one changes and the other does not, the test fails. For a single 30-line Stimulus file, that beats adding Jest and node_modules to the project.
Structural controller tests (_process_action_callbacks, instance_method, superclass) form a contract: the gem guarantees the controller has the right shape. The host app guarantees it behaves correctly in context. A clean separation of responsibilities.
What is next
DataPorter 0.1.0 covers the standard workflow. Here is what could come in future versions:
Batch imports: for 100k+ line files, import in batches of 1000 with insert_all instead of create! per record. This requires rethinking the persist contract -- instead of one record at a time, the target would receive a batch.
Streaming progress: replace ActionCable with Server-Sent Events (SSE) for apps that do not need bidirectional WebSocket. Lighter, no Redis dependency.
Custom validators: let targets declare validators with a DSL:
columns do
column :email, type: :email, required: true, validate: ->(val) {
"already exists" if User.exists?(email: val)
}
end
Export: the reverse path. If we can parse and validate records, we can serialize them to CSV/JSON. The Target already has all the information needed (columns, types, labels).
Final reflection
Building DataPorter was an exercise in discipline as much as code. The method -- Taskmaster for planning, TDD for implementation, one article to document each step -- forces explicit decisions. No "we will figure it out later". Every component exists because a test demands it, and every test exists because a behavior was specified.
The choice to skip the dummy app was a gamble. It paid off: tests are fast, components are decoupled, and the gem is testable without Rails infrastructure. But it has a cost -- some integration bugs will only surface in the host app. That is an accepted tradeoff: the gem tests its wiring, the host app tests its behavior.
StoreModel, Phlex, Stimulus -- each dependency brought its share of surprises. StoreModel's dirty tracking, Phlex's plain vs text, Stimulus's double-dash naming for engines. These gotchas appear in no documentation. They appear when a test fails at 11 PM and you read the gem's source code to understand why. That is the real advantage of TDD: you discover problems in the terminal, not in production.
DataPorter is now a published gem on RubyGems. One bundle add data_porter, one rails generate data_porter:install, a 15-line Target, and any Rails app has a complete import system with preview, validation, real-time progress, and dry run.
That was the plan from the start. It took 16 articles to get there.
This is part 16 of the series "Building DataPorter - A Data Import Engine for Rails". Previous: ERB Views Meet Phlex Components
GitHub: SerylLns/data_porter | RubyGems: data_porter
Top comments (0)