Damian Domżalski

Posted on Mar 4

How I Built an AI Selfie Analyzer with Rails 8 + Claude Vision API

#ruby #rails #ai #webdev

I built FirstVibe -- an AI-powered selfie analyzer that gives you a "vibe check." Upload a photo and within 30 seconds you get personality scores, a celebrity lookalike, aura type, dating energy, red/green flags, and a bunch of fun predictions. Think "what do people actually think when they first see you?" powered by Claude's vision capabilities.

It has processed over 6,000 selfie analyses so far. Here is how it works under the hood.

The Stack

Rails 8.0.4 with Ruby 3.3.0
PostgreSQL with UUID primary keys everywhere
Hotwire (Turbo + Stimulus) -- no React, no SPA
Tailwind CSS -- dark theme, no custom CSS beyond a few animations
Propshaft + Importmap -- zero Node.js, zero Webpack
Solid Queue -- DB-backed job queue running inside the Puma process
Claude Sonnet 4 + Haiku 4.5 via the ruby-anthropic gem (Vision API)
OpenAI API for AI caricature generation
Stripe Checkout for payments
Render.com for deployment

The "no Node.js" part is not an accident. Rails 8 with Importmap and Propshaft means I have zero JavaScript build steps. The entire frontend is Stimulus controllers, Turbo Streams, and ERB templates. It keeps things simple.

Architecture: Progressive Rendering with Parallel API Calls

The core challenge was making AI analysis feel fast. Claude Vision is powerful but not instant -- a single call with an image can take 8-15 seconds. Users staring at a blank loading screen for that long would bounce.

My solution: split the analysis into two parallel API calls and stream results progressively as they complete.

class ClaudeVisionService
  DETAILS_MODEL = "claude-haiku-4-5-20251001"

  def analyze_progressive(on_core_complete:, on_details_complete:)
    image_base64 = download_and_encode
    overused = AiOutputFrequency.overused_values(@scan.locale)

    core_prompt = build_prompt(:core, overused)
    details_prompt = build_prompt(:details, overused)

    core_queue = Queue.new
    details_queue = Queue.new

    Thread.new do
      response = call_claude(image_base64, core_prompt)
      core_queue.push([:ok, parse_response(response)])
    rescue => e
      core_queue.push([:error, e])
    end

    Thread.new do
      response = call_claude(image_base64, details_prompt, model: DETAILS_MODEL)
      details_queue.push([:ok, parse_response(response)])
    rescue => e
      details_queue.push([:error, e])
    end

    core_status, core_payload = core_queue.pop
    raise core_payload if core_status == :error
    on_core_complete.call(core_payload)

    details_status, details_payload = details_queue.pop
    if details_status == :error
      on_details_complete.call(nil, details_payload)
    else
      on_details_complete.call(details_payload, nil)
    end
  end
end

Call 1 (Core) uses Claude Sonnet and returns the main score, vibe label, tags, first impression, and all five category scores. This is the "above the fold" content -- what users see first.

Call 2 (Details) uses Claude Haiku 4.5 (cheaper, faster) and returns everything else: celebrity match, dating energy, theme song, vibe animal, red/green flags, tips, etc. This is the paywall content, so it can arrive a few seconds later without impacting perceived performance.

When core results arrive, the background job broadcasts a Turbo Stream update immediately:

# In AnalyzeScanJob
service.analyze_progressive(
  on_core_complete: ->(core_results) {
    scan.update!(
      results: core_results.merge("_details_pending" => true),
      status: :completed,
      completed_at: Time.current
    )
    Turbo::StreamsChannel.broadcast_update_to(
      "scan_#{scan.id}",
      target: "scan_content",
      partial: "scans/results_card",
      locals: { scan: scan, caricature_loading: true }
    )
  },
  on_details_complete: ->(details_results, error) {
    # Merge into existing results, broadcast again
  }
)

The user's browser is subscribed to the Turbo Stream channel, so results appear in real-time without polling. I also run a Stimulus-based polling fallback at 3-second intervals for cases where WebSockets drop.

In parallel with all of this, a third job (GenerateCaricatureJob) kicks off an OpenAI image generation request for the AI caricature.

Result: the user sees their core score and personality breakdown within 8-12 seconds, then detailed results fill in over the next few seconds. It feels snappy.

Image Pipeline

Every uploaded photo gets compressed before storage or AI analysis:

class ImageCompressor
  MAX_DIMENSION = 768
  JPEG_QUALITY = 70

  def self.compress(uploaded_file)
    result = ImageProcessing::Vips
      .source(uploaded_file.tempfile)
      .resize_to_limit(MAX_DIMENSION, MAX_DIMENSION)
      .saver(quality: JPEG_QUALITY, strip: true)
      .convert("jpeg")
      .call

    base64 = Base64.strict_encode64(result.read)
    result.close
    [base64, "image/jpeg"]
  end
end

Vips is fast and memory-efficient. Resizing to 768px and compressing to 70% JPEG quality reduces the image size significantly (often from 3-5MB to 80-150KB) without meaningful quality loss for AI analysis. The compressed image is cached in Redis for 10 minutes so the Vision API call does not need to download from S3 again.

The upload form itself does client-side resizing to 1000px via a Canvas element in a Stimulus controller, so the server-side compression is a second pass. Belt and suspenders.

JSONB for AI Results

All AI analysis results live in a single JSONB column (scans.results). No rigid schema, no migrations when I add a new analysis field.

# Free users see limited results
def visible_results
  return results if is_unlocked?

  free_categories = %w[attractiveness confidence]
  categories_preview = results["categories"]&.each_with_object({}) do |(key, val), hash|
    if free_categories.include?(key)
      hash[key] = { "score" => val["score"], "note" => nil }
    else
      hash[key] = { "score" => nil, "note" => nil }
    end
  end

  {
    "overall_score" => results["overall_score"],
    "vibe_label" => results["vibe_label"],
    "vibe_tags" => results["vibe_tags"],
    "first_impression" => truncate_to_first_sentence(results["first_impression"]),
    "categories" => categories_preview,
    "celebrity_match" => nil,
    "aura_type" => results["aura_type"].present? ?
      { "name" => results["aura_type"]["name"], "hex" => results["aura_type"]["hex"], "why" => nil } : nil,
    # ... everything else nil/locked
  }
end

The visible_results method controls the free/paid boundary in one place. Free users see the overall score, vibe label, 2 of 5 category scores (attractiveness and confidence -- the hooks), and a truncated first impression. Everything else is locked behind a $1.99-$2.49 paywall.

Solving AI Output Repetition

One problem I did not anticipate: Claude would fall into patterns. After a few hundred scans, way too many people were getting "Timothee Chalamet" as their celebrity match or "Golden Hour Aura" as their aura type.

My solution was an AiOutputFrequency tracking system:

class AiOutputFrequency < ApplicationRecord
  TRACKED_FIELDS = %w[celebrity_match vibe_label vibe_animal aura_type].freeze
  WINDOW = 100
  THRESHOLD = 0.15

  def self.overused_values(locale)
    overused = {}
    TRACKED_FIELDS.each do |field|
      recent = where(field_name: field, locale: locale)
               .order(created_at: :desc).limit(WINDOW).pluck(:field_value)
      total = recent.size.to_f
      next if total < 10

      recent.tally.each do |value, count|
        overused[field] ||= [] if count / total > THRESHOLD
        overused[field] << value if count / total > THRESHOLD
      end
    end
    overused
  end
end

Before each analysis, the service checks the last 100 outputs per field per locale. Anything appearing more than 15% of the time gets injected into the prompt as a "DIVERSITY RULE -- do NOT use these recently overused values." It works well. Celebrity matches and vibe labels are now genuinely varied.

Session-Based Identity (No Auth)

There are no user accounts. No sign-up, no login, no OAuth. Identity is a signed cookie:

module VisitorTrackable
  included do
    before_action :ensure_visitor_id
  end

  def ensure_visitor_id
    cookies.signed[:visitor_id] ||= {
      value: SecureRandom.uuid,
      expires: 30.days.from_now,
      httponly: true,
      secure: Rails.env.production?
    }
  end
end

This was a deliberate choice. The product is a quick, fun interaction -- forcing account creation before seeing results would tank conversions. The visitor ID links to scans, payments, experiment assignments, and analytics events. If someone pays, they can optionally save an email to access their results later.

Rate Limiting

Rack::Attack handles all rate limiting with a dedicated MemoryStore so counters do not compete with the application cache:

Scan creation: 3/hour per IP (AI calls are expensive)
Checkout: 10/hour per IP
OTP verification: 10/hour per IP
Admin panel: 5/minute per IP
General: 60 requests/minute per IP

Cost Optimization: Prompt Caching

Claude's prompt caching (cache_control: { type: "ephemeral" }) saves significant money. The system message and prompt template are cached, so repeated calls only pay for the image input and the unique response. With the system prompt being ~500 tokens and the analysis prompt being ~2,000 tokens, this cuts input token costs meaningfully across thousands of scans.

def call_claude(image_base64, prompt, model: nil)
  api_client.call(
    system: [
      { type: "text", text: SYSTEM_MESSAGE, cache_control: { type: "ephemeral" } }
    ],
    messages: [
      {
        role: "user",
        content: [
          { type: "text", text: prompt, cache_control: { type: "ephemeral" } },
          { type: "image", source: { type: "base64", media_type: @content_type, data: image_base64 } }
        ]
      }
    ]
  )
end

Background Jobs with Solid Queue

Solid Queue is underrated. It is a database-backed job queue that ships with Rails 8, and I run it inside the Puma process (SOLID_QUEUE_IN_PUMA=1). No Redis, no Sidekiq, no separate worker process. On Render.com this means one dyno handles everything -- web requests, background jobs, recurring tasks.

Recurring jobs handle cleanup and automation:

Clearing finished Solid Queue jobs (hourly)
Cleaning expired email verifications (daily)
Purging stale AI frequency records (daily)
Sending daily growth reports (8am UTC)

Deployment

The entire app runs on Render.com with no Docker. Build step: bundle install + assets:precompile + db:prepare. That is it. No container orchestration, no Kubernetes. A single web service with Solid Queue running in-process.

A/B Testing

I built a simple, deterministic A/B testing system. Variant assignment uses SHA256 hashing of "#{visitor_id}:#{experiment_name}" so it is stateless and consistent:

hash = Digest::SHA256.hexdigest("#{visitor_id}:#{experiment_name}").to_i(16)
bucket = hash % total_weight

Currently testing pricing ($1.99 vs $2.49), paywall headline copy (curiosity vs loss aversion), and paywall teaser styles (hard blur vs gradient fade). All configured in a single Ruby module, no external service needed.

Results After 6,000+ Scans

Some things I learned:

Progressive rendering matters. The switch from "wait for everything" to "show core results immediately" reduced bounce rate during loading noticeably.
Freemium works for impulse products. Showing the score for free and locking the details behind a paywall creates natural curiosity. People want to know their celebrity match.
AI output diversity requires active management. Without the frequency tracking system, Claude would give 20% of users the same celebrity match.
No-auth simplicity pays off. The upload-to-results flow takes 30 seconds. Adding a signup step would have killed conversions.
Rails 8 with Hotwire is genuinely good for this. Real-time updates via Turbo Streams, no JavaScript framework, no build step. The entire JS footprint is Stimulus controllers.
Prompt caching saves real money. At scale, caching the system message and prompt template with Claude API reduces costs significantly per scan.

What I Would Do Differently

Start with progressive rendering from day one instead of retrofitting it.
Build the AI output frequency tracking earlier -- the repetition problem is not obvious until you hit a few hundred scans.
Use a more structured error recovery system for the details call. Right now, failed detail calls get retried once and then flagged. A circuit breaker pattern would be cleaner.

If you want to try it: firstvibe.app. The basic vibe check is free. Happy to answer questions about any part of the architecture.

DEV Community