DEV Community

red bean
red bean

Posted on

You Don't Have 50,000 Users — How Ghost Profiles Pollute Your PostHog Data

You open PostHog. Persons tab says 50,000. You tell your team, your investors, your board: "We have 50,000 users."

You don't. You probably have 35,000. Maybe fewer.

The rest are ghosts — duplicate person records that PostHog created because its identity resolution isn't perfect. The same human, counted twice, three times, sometimes more. Different devices. Different browsers. An app session here, a web session there. PostHog sees each one as a new person.

Why this happens

PostHog links anonymous sessions to known users when you call posthog.identify(). In theory, this merges the anonymous profile with the identified one. In practice, it fails more often than you'd think:

  • Same person, two devices. They use your app on their phone and your website on their laptop. Two separate distinct_id values. If they don't log in on both, PostHog has no way to connect them. Two person records.

  • Same person, same device, cleared cookies. They visited your site in January. Cookies expired or got cleared. They came back in March. New distinct_id. PostHog creates a second person.

  • Race conditions in identify calls. PostHog's merge logic is eventual, not transactional. When identify events come in close together from different clients, the merge can silently fail. The result: two person records with the same email.

  • In-app browsers. Your app opens a link in a WebView. PostHog's web SDK sees a completely new session with no connection to the native app. One user, two people in PostHog.

This isn't theoretical. PostHog has had open GitHub issues about identity merging since 2020. They built $merge_dangerously specifically because the standard merge doesn't always work.

Why it matters more than you think

Ghost profiles don't just inflate a vanity metric. They break everything downstream.

Your funnels are wrong. A user who signed up on their phone and converted on their laptop shows as two separate journeys. One person who completed the funnel, but PostHog sees it as one dropout and one direct conversion. Your conversion rate is wrong in both directions.

Your retention is wrong. Same user counted twice means they can "retain" by switching devices. Or they look churned on one device while still active on another. Your retention curves are lying to you.

Your cohorts are wrong. Behavioral cohorts include ghost profiles. You're targeting "users who did X but not Y" — except some of them did Y, just on a different device under a different person record.

Your A/B tests are wrong. If the same person is in your experiment as two different participants, potentially in different variants, your results have noise you can't account for.

You're making product decisions based on data that's 15-30% wrong. That's the real cost.

How bad is it in your project?

We built a free tool that connects to your PostHog instance and tells you exactly how many ghost profiles you have. It checks three signals:

  1. Email duplicates — multiple person records with the same email address. PostHog should have merged these but didn't.

  2. Phone duplicates — same thing with phone numbers.

  3. Device ID duplicates — PostHog's own SDKs set a $device_id on every event. If two different person records have events from the same device ID, that's the same browser or phone creating two people. This catches anonymous duplicates that email matching can't.

It takes about 60 seconds. No code changes, nothing gets modified in your PostHog instance. You paste a read-only API key and get a report.

Run a free audit on your PostHog data

What you can do about it

Once you know the scope of the problem, you have a few options:

Manual merges. PostHog lets you merge person records manually in the UI. Fine if you have 20 duplicates. Not practical if you have 2,000.

Better identify() hygiene. Make sure every client calls posthog.identify() with a consistent user ID as early as possible. This prevents some duplicates going forward but doesn't fix the ones already in your data.

CrossTrack. We're building an identity resolution layer that sits alongside PostHog. It detects duplicates in real-time using email, phone, device fingerprinting, and a WebView bridge that links app sessions to in-app browser sessions. The SDK is open source, 3.2 KB, zero dependencies.

The number you should actually be tracking

Instead of "persons" in PostHog, look at your ratio of distinct IDs to persons. If you have 50,000 persons and 80,000 distinct IDs, that's 1.6 distinct IDs per person on average. Some of that is normal (one anonymous ID + one identified ID = 2). But if your average is above 2.0, or if you find persons with 5, 10, 20 distinct IDs — your identity resolution has gaps.

The audit report shows you this ratio along with the specific duplicate clusters. At minimum, it tells you whether your user count is a number you can trust.

Check your PostHog data for free

Top comments (0)