DEV Community

Mohammad Waseem
Mohammad Waseem

Posted on

Frontend Shakeup: Taming Fake Users in Production DB

Frontend Shakeup: Taming Fake Users in Production DB

The Problem

When a product grows, so do its test harnesses. It’s common to generate placeholder accounts in a CI environment to exercise user‑centric flows. Over time those same records tend to seep into the production database—whether through accidental merges, mis‑configured scripts, or abandoned data‑seeding jobs. The result is a noisy table of fifty‑plus silhouettes that mimic real subscribers but hold no real usage data.

Technically, the issue is twofold:

  1. Data integrity. Fake rows increase cardinality, skew query plans, and inflate storage usage.
  2. Audit & compliance. Production datasets often feed analytics, compliance reports, or security checks; spurious entries can lead to misleading insights or regulatory slip‑ups.

Why it matters

Fake users are one of the quickest ways a QA pipeline can turn a “nice” database into a cluttered mess. Each hour spent filtering out the noise could be spent on automating the next batch of regression tests. When the backing store becomes a playground of decoys, automated end‑to‑end workflows stumble, and the QA team’s “boredom” turns into frustration.

The Solution

Below is a practical, step‑by‑step routine that quirks out the problem while keeping the database production‑ready. All operations assume a PostgreSQL backend; for other RDBMS the spirit stays the same.


1. Tag fake users explicitly

Add a lightweight boolean flag or enum column (is_test_user) to the users table. Keep the column defaulted to false. Scripts that create mock accounts should set the flag to true and record a created_at stamp if needed.

ALTER TABLE users ADD COLUMN is_test_user boolean NOT NULL DEFAULT false;
Enter fullscreen mode Exit fullscreen mode

2. Seal the production seeding pipeline

Check that any seed‑data jobs never target production. DATABASE_URL or equivalent environment variables should be isolated per cluster. A simple CI check can fail if the job tag contains the word “test” while the target environment is marked “prod”.

3. Run a weekly housekeeping job

A scheduled routine removes dormant fake accounts that have survived a generation cycle. Keep the logic conservative: remove only those with is_test_user = true and no activity (last_sign_in_at older than 90 days).

DELETE FROM users
WHERE is_test_user = true
  AND last_sign_in_at < NOW() - INTERVAL '90 days';
Enter fullscreen mode Exit fullscreen mode

Make sure the operation is batched (e.g., 10k rows at a time) to avoid locking the table.

4. Protect the flag during migrations

If a schema change touches the users table, include a statement that preserves the is_test_user column. For example, in Rails migrations:

add_column :users, :is_test_user, :boolean, default: false, null: false
Enter fullscreen mode Exit fullscreen mode

Add a comment to that column that explains its purpose for future developers.

5. Monitor and alert

Create a lightweight dashboard that surfaces the count of test accounts daily. If the number rises beyond a chosen threshold (e.g., 1% of total users), raise an incident.

SELECT COUNT(*) FILTER (WHERE is_test_user = true) AS test_user_count,
       COUNT(*) FILTER (WHERE is_test_user = false) AS real_user_count
FROM users;
Enter fullscreen mode Exit fullscreen mode

6. Leakproof the test codebase

Embed unit tests that deliberately attempt to hit production data via fake‑user creation. Use test‑double patterns or database fixtures that never commit to production. If a test flaps, CI alerts will surface the violation immediately.

7. Document the lifecycle

Add an educational README entry for the test-data pipeline. Mention the flag, retention period, and housekeeping job. Anyone touching the code will instantly see the anti‑spoilage steps, reducing the chance of accidental insertion.


Wrap‑up

Taming fake users is a blend of disciplined schema design, cautious deployment pipelines, and regular data hygiene. By tagging records, automating clean‑ups, and monitoring for drift, QA teams can keep production trivial again, letting automation focus on genuine quality work instead of chasing phantom accounts.

🛠️ The Tool I Use

    For generating clean test data and disposable emails for these workflows, I personally use [TempoMail USA](https://tempomailusa.com). It’s fast, has an API-like feel, and keeps my production data clean.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)