DEV Community

Cover image for IDs First, Profiles Later: A Cheaper Way to Analyze Follower Graphs
Dacid Chain
Dacid Chain

Posted on

IDs First, Profiles Later: A Cheaper Way to Analyze Follower Graphs

Most audience-analysis projects start with one deceptively simple question:

Which people are shared across these audiences, and which people are unique?

That question sounds like marketing.

In practice, it is a data engineering problem.

If you are analyzing creators, competitors, communities, conferences, or niche accounts, the expensive mistake is usually the same:

You hydrate every profile too early.

You pull usernames, bios, avatars, follower counts, descriptions, and other profile fields before you know which users are actually worth inspecting.

For many social graph workflows, the better pattern is:

IDs first. Profiles later.
Enter fullscreen mode Exit fullscreen mode

Collect the graph as raw IDs. Run set operations. Find the interesting segments. Hydrate only the users that survive that first pass.

This post walks through that pattern using Twitter/X-style follower data as the example.

The same idea applies to any API or dataset where:

  • ID-only edges are cheaper than full objects,
  • relationships matter more than profile fields at first,
  • you can enrich selected IDs later,
  • and your real goal is overlap, uniqueness, change detection, or lead scoring.

Core pattern

Collect cheap relationship data broadly, analyze structure first, and enrich expensive objects selectively.

Follower data use cases: audience overlap, incremental reach, competitive research, weekly snapshots, and lead discovery

A follower list is less useful as a list, and more useful as a relationship map you can query.


In this post


TL;DR

Step What to do Why it helps
1 Pull follower IDs Collect graph edges cheaply
2 Store raw relationships Keep the data model simple
3 Run set operations Find overlap, uniqueness, and repeated appearances
4 Pick useful segments Avoid enriching low-signal users
5 Hydrate profiles later Spend API calls only where they matter

Rule of thumb

If your first question is about relationships, do not start by fetching profile metadata.

This reduces cost, makes experiments faster, and keeps your data model cleaner.


Why full profiles are often premature

A full follower profile is useful when you need to display or filter by:

  • username,
  • display name,
  • bio,
  • avatar,
  • follower count,
  • location,
  • public metrics.

But many first-pass questions do not need any of those fields.

For example:

  • Which users follow both creator A and creator B?
  • Which followers are shared by three competitors?
  • Which creator has the most incremental reach?
  • Which accounts appeared in this week's snapshot but not last week's?
  • Which IDs show up across multiple high-intent communities?

Those questions need relationships, not bios.

Once you treat the problem as graph analysis, the data you need first is much smaller:

source_account_id -> follower_user_id
Enter fullscreen mode Exit fullscreen mode

That edge is enough to compute a surprising amount of value.


The architecture

Think of the pipeline in three stages:

Stage Input Output
Collect IDs Target accounts source_account_id -> follower_user_id edges
Analyze graph Raw ID lists overlap, uniqueness, frequency, growth, decay
Hydrate selected profiles High-signal IDs usernames, bios, metrics, exports
Get followers -> store edges -> find signals -> hydrate later
Enter fullscreen mode Exit fullscreen mode

That separation is the whole trick.

It keeps graph collection cheap and broad, while keeping profile enrichment focused and narrow.

Follower association map showing shared audience across three accounts and signals to query

The graph tells you who is worth inspecting before you pay to hydrate full profiles.


A concrete cost model

To make the math less abstract, here is one real pricing example.

TwitterAPI.io added an IDs-only follower endpoint in May 2026. Their docs show that /twitter/user/followers_ids can return up to 5,000 follower IDs per call, with the largest tier priced at 0.45 credits per ID.

They also document:

100,000 credits = $1
1 credit = $0.00001
Enter fullscreen mode Exit fullscreen mode

So a 5,000-ID page costs:

5,000 IDs * 0.45 credits = 2,250 credits
2,250 credits = $0.0225
Enter fullscreen mode Exit fullscreen mode

Now compare three ways to map an account with 50,000 followers:

Approach What you collect Approx cost
Old full-profile pull 50,000 full profiles at 15 credits each $7.50
New full-profile pull 50,000 full profiles at 1 credit each $0.50
IDs-first pull 50,000 raw follower IDs $0.225

The exact numbers will depend on your provider and plan. The point is the shape of the economics:

If IDs are cheaper than profiles, do graph work before profile work.

Cost-model assumptions

This example uses one public API pricing model as a concrete reference point. The important takeaway is not the exact vendor or exact number. The important takeaway is the relationship between cheap edge collection and more expensive object hydration.

If your provider prices IDs and full profiles differently, model the same workflow in three columns:

Question What to estimate
How many IDs do I need? target accounts * followers per account
How many profiles do I actually inspect? overlap segment, unique segment, high-frequency segment
What happens if I hydrate everything first? total IDs * full-profile price

Minimal data model

You do not need a graph database on day one.

A relational table can take you far:

CREATE TABLE twitter_follow_edges (
  source_user_id TEXT NOT NULL,
  follower_user_id TEXT NOT NULL,
  collected_at TIMESTAMP NOT NULL,
  PRIMARY KEY (source_user_id, follower_user_id)
);
Enter fullscreen mode Exit fullscreen mode

Use TEXT for IDs. Twitter/X IDs can exceed JavaScript's safe integer range, and many APIs return them as strings for that reason.

You can add hydrated profile data separately:

CREATE TABLE twitter_user_profiles (
  user_id TEXT PRIMARY KEY,
  username TEXT,
  display_name TEXT,
  bio TEXT,
  followers_count INTEGER,
  hydrated_at TIMESTAMP NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

Keeping edges and profiles separate makes the pipeline easier to reason about:

  • edge collection answers graph questions,
  • profile hydration answers identity and filtering questions.

Design note

Keep relationship collection and profile enrichment as separate jobs. That makes retries, deduplication, cost tracking, and data retention much easier to reason about.


Collecting follower IDs

A typical IDs-only request looks like this:

curl "https://api.twitterapi.io/twitter/user/followers_ids?userName=elonmusk&count=5000" \
  -H "X-API-Key: YOUR_API_KEY"
Enter fullscreen mode Exit fullscreen mode

The important detail is pagination.

Store next_cursor and keep fetching until the response says there are no more pages.

JavaScript collector example

async function collectFollowerIds(userName) {
  let cursor = "";
  const ids = [];

  while (true) {
    const url = new URL("https://api.twitterapi.io/twitter/user/followers_ids");
    url.searchParams.set("userName", userName);
    url.searchParams.set("count", "5000");

    if (cursor) {
      url.searchParams.set("cursor", cursor);
    }

    const response = await fetch(url, {
      headers: {
        "X-API-Key": process.env.TWITTERAPI_IO_KEY,
      },
    });

    if (!response.ok) {
      throw new Error(`Request failed: ${response.status}`);
    }

    const data = await response.json();

    ids.push(...data.ids);

    if (!data.has_next_page || !data.next_cursor || data.next_cursor === "0") {
      break;
    }

    cursor = data.next_cursor;
  }

  return ids;
}
Enter fullscreen mode Exit fullscreen mode

Production checklist

  • retry with exponential backoff,
  • request logging,
  • rate-limit handling,
  • cursor checkpointing,
  • deduplication before writing,
  • a collection_job_id for traceability.

But the core loop is small.


Use-case map

Use case Graph question First segment to hydrate
Audience overlap Who follows multiple accounts? shared followers
Incremental reach Who is unique to one creator? unique followers
Competitive research Who follows several tools in a category? multi-competitor followers
Weekly snapshots Who appeared or disappeared? newly gained or lost followers
Lead discovery Who repeats across high-intent lists? high-frequency IDs

The examples below all use the same principle: find the signal with IDs first, then enrich the segment that matters.

Use case 1: audience overlap

Suppose you are researching three AI-focused accounts:

@founder_ai
@ml_builder
@agent_tools
Enter fullscreen mode Exit fullscreen mode

Toy dataset:

Account Followers collected
@founder_ai 120,000
@ml_builder 85,000
@agent_tools 64,000

Once you have ID lists, overlap is just set math:

Set-intersection example

const founderAI = new Set(founderAiFollowerIds);
const mlBuilder = new Set(mlBuilderFollowerIds);
const agentTools = new Set(agentToolsFollowerIds);

function intersection(a, b) {
  return new Set([...a].filter((id) => b.has(id)));
}

function threeWayIntersection(a, b, c) {
  return new Set([...a].filter((id) => b.has(id) && c.has(id)));
}

const founderAndML = intersection(founderAI, mlBuilder);
const founderAndAgent = intersection(founderAI, agentTools);
const mlAndAgent = intersection(mlBuilder, agentTools);
const allThree = threeWayIntersection(founderAI, mlBuilder, agentTools);

console.log({
  founderAndML: founderAndML.size,
  founderAndAgent: founderAndAgent.size,
  mlAndAgent: mlAndAgent.size,
  allThree: allThree.size,
});
Enter fullscreen mode Exit fullscreen mode

Example output:

{
  "founderAndML": 18420,
  "founderAndAgent": 12380,
  "mlAndAgent": 10955,
  "allThree": 6210
}
Enter fullscreen mode Exit fullscreen mode

The 6,210 users following all three accounts are probably more interesting than the average follower.

That is the segment I would hydrate first.

What this gives you

A smaller, higher-intent segment that is easier to inspect, export, score, or enrich.

Use case 2: incremental reach

Follower count alone is a weak metric.

If two creators have the same audience, buying both sponsorships may not add much reach. A smaller creator with a more unique audience can be more valuable.

You can estimate uniqueness like this:

Unique-audience example

function uniqueToA(a, b, c) {
  return new Set([...a].filter((id) => !b.has(id) && !c.has(id)));
}

const uniqueFounderAI = uniqueToA(founderAI, mlBuilder, agentTools);
const uniqueMLBuilder = uniqueToA(mlBuilder, founderAI, agentTools);
const uniqueAgentTools = uniqueToA(agentTools, founderAI, mlBuilder);

console.log({
  uniqueFounderAI: uniqueFounderAI.size,
  uniqueMLBuilder: uniqueMLBuilder.size,
  uniqueAgentTools: uniqueAgentTools.size,
});
Enter fullscreen mode Exit fullscreen mode

Example output:

{
  "uniqueFounderAI": 81400,
  "uniqueMLBuilder": 52900,
  "uniqueAgentTools": 39800
}
Enter fullscreen mode Exit fullscreen mode

Now you can rank creators by incremental reach, not just total followers.

Use case 3: competitive audience research

For competitive research, graph data can tell you how users relate to a category.

Imagine you are building a SaaS product for sales teams. You collect follower IDs for:

@crm_tool_a
@sales_ai_b
@pipeline_app_c
Enter fullscreen mode Exit fullscreen mode

Then segment users by relationship:

Segment Possible interpretation
Follows only competitor A Possibly loyal to one tool
Follows A and B Comparing alternatives
Follows all competitors Strong category interest
Follows competitor + analyst Higher-intent audience
Follows competitor but not your brand Potential acquisition audience

None of this requires profile bios at the first step.

The graph gets you to a shortlist.

Why this matters

Competitive follower data becomes more useful when you treat it as category-intent data, not as a flat export.

Use case 4: weekly audience snapshots

IDs are also useful for tracking changes over time.

For a creator CRM or audience analytics product, you can collect weekly snapshots:

snapshot_2026_05_01
snapshot_2026_05_08
snapshot_2026_05_15
Enter fullscreen mode Exit fullscreen mode

Then compare sets:

Snapshot comparison example

function difference(a, b) {
  return new Set([...a].filter((id) => !b.has(id)));
}

const gained = difference(currentWeekFollowers, lastWeekFollowers);
const lost = difference(lastWeekFollowers, currentWeekFollowers);

console.log({
  gainedFollowers: gained.size,
  lostFollowers: lost.size,
  netChange: gained.size - lost.size,
});
Enter fullscreen mode Exit fullscreen mode

Example output:

{
  "gainedFollowers": 8421,
  "lostFollowers": 1960,
  "netChange": 6461
}
Enter fullscreen mode Exit fullscreen mode

That can help you detect:

  • which campaign drove new followers,
  • whether an audience changed after a launch,
  • whether growth is coming from category-relevant users,
  • which new followers should be enriched and scored.

Again: IDs first, profiles later.

Use case 5: lead discovery

For B2B lead discovery, the interesting users are rarely "all followers."

They are users matching patterns like:

  • follows 3+ accounts in your niche,
  • follows your competitor,
  • follows a relevant conference,
  • follows a known category analyst,
  • recently followed a high-intent account.

One simple scoring approach:

Frequency-scoring example

const audiences = [
  founderAiFollowerIds,
  mlBuilderFollowerIds,
  agentToolsFollowerIds,
  aiConferenceFollowerIds,
];

const score = new Map();

for (const audience of audiences) {
  for (const id of audience) {
    score.set(id, (score.get(id) || 0) + 1);
  }
}

const highIntentIds = [...score.entries()]
  .filter(([, count]) => count >= 3)
  .map(([id]) => id);

console.log(`${highIntentIds.length} high-intent accounts found`);
Enter fullscreen mode Exit fullscreen mode

Example output:

9,842 high-intent accounts found
Enter fullscreen mode Exit fullscreen mode

At that point, hydrating fewer than 10,000 users makes more sense than hydrating hundreds of thousands.


Querying overlap in SQL

If you store edges in Postgres, a three-account overlap query can be simple:

SELECT follower_user_id
FROM twitter_follow_edges
WHERE source_user_id IN ('account_a', 'account_b', 'account_c')
GROUP BY follower_user_id
HAVING COUNT(DISTINCT source_user_id) = 3;
Enter fullscreen mode Exit fullscreen mode

For pairwise overlap counts:

SELECT
  a.source_user_id AS account_a,
  b.source_user_id AS account_b,
  COUNT(*) AS shared_followers
FROM twitter_follow_edges a
JOIN twitter_follow_edges b
  ON a.follower_user_id = b.follower_user_id
 AND a.source_user_id < b.source_user_id
GROUP BY a.source_user_id, b.source_user_id
ORDER BY shared_followers DESC;
Enter fullscreen mode Exit fullscreen mode

This is enough for a first version of an overlap dashboard.


When to hydrate profiles

Hydrate profiles when the graph has already narrowed the problem.

Hydrate when... Why
Users are in the overlap of several target audiences They likely represent stronger category interest
Users are unique to a high-value creator They may add incremental reach
Users were newly gained after a campaign They can explain movement
Users appear across several niche lists They may be lead candidates
Users are selected for export, scoring, or sales review The profile data has an immediate use

Bad hydration trigger:

"We collected the ID, so we might as well fetch everything."
Enter fullscreen mode Exit fullscreen mode

That is how small graph experiments turn into large bills.


A small dashboard idea

If I were building a weekend project around this, I would build an audience overlap dashboard:

  1. Input 3-10 Twitter/X usernames.
  2. Pull follower IDs for each account.
  3. Save edges in Postgres.
  4. Compute pairwise overlap, multi-way overlap, unique audience, and repeated followers.
  5. Hydrate only the top segments.
  6. Export CSVs for deeper research.

Example output:

Creator Followers Unique followers Shared with others Incremental reach score
@founder_ai 120,000 81,400 38,600 0.68
@ml_builder 85,000 52,900 32,100 0.62
@agent_tools 64,000 39,800 24,200 0.62

The dashboard does not need to answer every question.

It just needs to answer the first useful one:

Where is the audience overlap, and which users are worth inspecting next?


Tradeoffs

IDs-first is not always the right default.

Use IDs-first when... Use full profiles earlier when...
graph structure is the main question your product needs immediate profile display
lists are large filtering depends on bio or public metrics
IDs are cheaper than full profiles you only collect small lists
you can enrich later your API charges the same for IDs and profiles
you need fast experimentation compliance rules affect stored relationship data

But when graph structure is the main question, IDs-first is often cleaner and cheaper.


Final takeaway

The bigger lesson is not about one endpoint or one provider.

It is a general data engineering pattern:

Collect cheap relationship data broadly.
Analyze structure first.
Enrich expensive objects selectively.
Enter fullscreen mode Exit fullscreen mode

For follower graphs, that means:

IDs first. Profiles later.
Enter fullscreen mode Exit fullscreen mode

That one design choice can make audience overlap, incremental reach analysis, weekly snapshots, and lead discovery much more practical.

Reference links:

Top comments (0)