Dacid Chain

Posted on May 14

IDs First, Profiles Later: A Cheaper Way to Analyze Follower Graphs

#twitter #dataengineering #marketing #api

Most audience-analysis projects start with one deceptively simple question:

Which people are shared across these audiences, and which people are unique?

That question sounds like marketing.

In practice, it is a data engineering problem.

If you are analyzing creators, competitors, communities, conferences, or niche accounts, the expensive mistake is usually the same:

You hydrate every profile too early.

You pull usernames, bios, avatars, follower counts, descriptions, and other profile fields before you know which users are actually worth inspecting.

For many social graph workflows, the better pattern is:

IDs first. Profiles later.

Collect the graph as raw IDs. Run set operations. Find the interesting segments. Hydrate only the users that survive that first pass.

This post walks through that pattern using Twitter/X-style follower data as the example.

The same idea applies to any API or dataset where:

ID-only edges are cheaper than full objects,
relationships matter more than profile fields at first,
you can enrich selected IDs later,
and your real goal is overlap, uniqueness, change detection, or lead scoring.

Core pattern

Collect cheap relationship data broadly, analyze structure first, and enrich expensive objects selectively.

A follower list is less useful as a list, and more useful as a relationship map you can query.

TL;DR

Step	What to do	Why it helps
1	Pull follower IDs	Collect graph edges cheaply
2	Store raw relationships	Keep the data model simple
3	Run set operations	Find overlap, uniqueness, and repeated appearances
4	Pick useful segments	Avoid enriching low-signal users
5	Hydrate profiles later	Spend API calls only where they matter

Rule of thumb

If your first question is about relationships, do not start by fetching profile metadata.

This reduces cost, makes experiments faster, and keeps your data model cleaner.

Why full profiles are often premature

A full follower profile is useful when you need to display or filter by:

username,
display name,
bio,
avatar,
follower count,
location,
public metrics.

But many first-pass questions do not need any of those fields.

For example:

Which users follow both creator A and creator B?
Which followers are shared by three competitors?
Which creator has the most incremental reach?
Which accounts appeared in this week's snapshot but not last week's?
Which IDs show up across multiple high-intent communities?

Those questions need relationships, not bios.

Once you treat the problem as graph analysis, the data you need first is much smaller:

source_account_id -> follower_user_id

That edge is enough to compute a surprising amount of value.

The architecture

Think of the pipeline in three stages:

Stage	Input	Output
Collect IDs	Target accounts	`source_account_id -> follower_user_id` edges
Analyze graph	Raw ID lists	overlap, uniqueness, frequency, growth, decay
Hydrate selected profiles	High-signal IDs	usernames, bios, metrics, exports

Get followers -> store edges -> find signals -> hydrate later

That separation is the whole trick.

It keeps graph collection cheap and broad, while keeping profile enrichment focused and narrow.

The graph tells you who is worth inspecting before you pay to hydrate full profiles.

A concrete cost model

To make the math less abstract, here is one real pricing example.

TwitterAPI.io added an IDs-only follower endpoint in May 2026. Their docs show that /twitter/user/followers_ids can return up to 5,000 follower IDs per call, with the largest tier priced at 0.45 credits per ID.

They also document:

100,000 credits = $1
1 credit = $0.00001

So a 5,000-ID page costs:

5,000 IDs * 0.45 credits = 2,250 credits
2,250 credits = $0.0225

Now compare three ways to map an account with 50,000 followers:

Approach	What you collect	Approx cost
Old full-profile pull	50,000 full profiles at 15 credits each	$7.50
New full-profile pull	50,000 full profiles at 1 credit each	$0.50
IDs-first pull	50,000 raw follower IDs	$0.225

The exact numbers will depend on your provider and plan. The point is the shape of the economics:

If IDs are cheaper than profiles, do graph work before profile work.

Cost-model assumptions

This example uses one public API pricing model as a concrete reference point. The important takeaway is not the exact vendor or exact number. The important takeaway is the relationship between cheap edge collection and more expensive object hydration.

If your provider prices IDs and full profiles differently, model the same workflow in three columns:

Question	What to estimate
How many IDs do I need?	target accounts * followers per account
How many profiles do I actually inspect?	overlap segment, unique segment, high-frequency segment
What happens if I hydrate everything first?	total IDs * full-profile price

Minimal data model

You do not need a graph database on day one.

A relational table can take you far:

CREATE TABLE twitter_follow_edges (
  source_user_id TEXT NOT NULL,
  follower_user_id TEXT NOT NULL,
  collected_at TIMESTAMP NOT NULL,
  PRIMARY KEY (source_user_id, follower_user_id)
);

Use TEXT for IDs. Twitter/X IDs can exceed JavaScript's safe integer range, and many APIs return them as strings for that reason.

You can add hydrated profile data separately:

CREATE TABLE twitter_user_profiles (
  user_id TEXT PRIMARY KEY,
  username TEXT,
  display_name TEXT,
  bio TEXT,
  followers_count INTEGER,
  hydrated_at TIMESTAMP NOT NULL
);

Keeping edges and profiles separate makes the pipeline easier to reason about:

edge collection answers graph questions,
profile hydration answers identity and filtering questions.

Design note

Keep relationship collection and profile enrichment as separate jobs. That makes retries, deduplication, cost tracking, and data retention much easier to reason about.

Collecting follower IDs

A typical IDs-only request looks like this:

curl "https://api.twitterapi.io/twitter/user/followers_ids?userName=elonmusk&count=5000" \
  -H "X-API-Key: YOUR_API_KEY"

The important detail is pagination.

Store next_cursor and keep fetching until the response says there are no more pages.

JavaScript collector example

async function collectFollowerIds(userName) {
  let cursor = "";
  const ids = [];

  while (true) {
    const url = new URL("https://api.twitterapi.io/twitter/user/followers_ids");
    url.searchParams.set("userName", userName);
    url.searchParams.set("count", "5000");

    if (cursor) {
      url.searchParams.set("cursor", cursor);
    }

    const response = await fetch(url, {
      headers: {
        "X-API-Key": process.env.TWITTERAPI_IO_KEY,
      },
    });

    if (!response.ok) {
      throw new Error(`Request failed: ${response.status}`);
    }

    const data = await response.json();

    ids.push(...data.ids);

    if (!data.has_next_page || !data.next_cursor || data.next_cursor === "0") {
      break;
    }

    cursor = data.next_cursor;
  }

  return ids;
}

Production checklist

retry with exponential backoff,
request logging,
rate-limit handling,
cursor checkpointing,
deduplication before writing,
a collection_job_id for traceability.

But the core loop is small.

Use-case map

Use case	Graph question	First segment to hydrate
Audience overlap	Who follows multiple accounts?	shared followers
Incremental reach	Who is unique to one creator?	unique followers
Competitive research	Who follows several tools in a category?	multi-competitor followers
Weekly snapshots	Who appeared or disappeared?	newly gained or lost followers
Lead discovery	Who repeats across high-intent lists?	high-frequency IDs

The examples below all use the same principle: find the signal with IDs first, then enrich the segment that matters.

Use case 1: audience overlap

Suppose you are researching three AI-focused accounts:

@founder_ai
@ml_builder
@agent_tools

Toy dataset:

Account	Followers collected
`@founder_ai`	120,000
`@ml_builder`	85,000
`@agent_tools`	64,000

Once you have ID lists, overlap is just set math:

Set-intersection example

const founderAI = new Set(founderAiFollowerIds);
const mlBuilder = new Set(mlBuilderFollowerIds);
const agentTools = new Set(agentToolsFollowerIds);

function intersection(a, b) {
  return new Set([...a].filter((id) =&gt; b.has(id)));
}

function threeWayIntersection(a, b, c) {
  return new Set([...a].filter((id) =&gt; b.has(id) &amp;&amp; c.has(id)));
}

const founderAndML = intersection(founderAI, mlBuilder);
const founderAndAgent = intersection(founderAI, agentTools);
const mlAndAgent = intersection(mlBuilder, agentTools);
const allThree = threeWayIntersection(founderAI, mlBuilder, agentTools);

console.log({
  founderAndML: founderAndML.size,
  founderAndAgent: founderAndAgent.size,
  mlAndAgent: mlAndAgent.size,
  allThree: allThree.size,
});

Example output:

{
  "founderAndML": 18420,
  "founderAndAgent": 12380,
  "mlAndAgent": 10955,
  "allThree": 6210
}

The 6,210 users following all three accounts are probably more interesting than the average follower.

That is the segment I would hydrate first.

What this gives you

A smaller, higher-intent segment that is easier to inspect, export, score, or enrich.

Use case 2: incremental reach

Follower count alone is a weak metric.

If two creators have the same audience, buying both sponsorships may not add much reach. A smaller creator with a more unique audience can be more valuable.

You can estimate uniqueness like this:

Unique-audience example

function uniqueToA(a, b, c) {
  return new Set([...a].filter((id) =&gt; !b.has(id) &amp;&amp; !c.has(id)));
}

const uniqueFounderAI = uniqueToA(founderAI, mlBuilder, agentTools);
const uniqueMLBuilder = uniqueToA(mlBuilder, founderAI, agentTools);
const uniqueAgentTools = uniqueToA(agentTools, founderAI, mlBuilder);

console.log({
  uniqueFounderAI: uniqueFounderAI.size,
  uniqueMLBuilder: uniqueMLBuilder.size,
  uniqueAgentTools: uniqueAgentTools.size,
});

Example output:

{
  "uniqueFounderAI": 81400,
  "uniqueMLBuilder": 52900,
  "uniqueAgentTools": 39800
}

Now you can rank creators by incremental reach, not just total followers.

Use case 3: competitive audience research

For competitive research, graph data can tell you how users relate to a category.

Imagine you are building a SaaS product for sales teams. You collect follower IDs for:

@crm_tool_a
@sales_ai_b
@pipeline_app_c

Then segment users by relationship:

Segment	Possible interpretation
Follows only competitor A	Possibly loyal to one tool
Follows A and B	Comparing alternatives
Follows all competitors	Strong category interest
Follows competitor + analyst	Higher-intent audience
Follows competitor but not your brand	Potential acquisition audience

None of this requires profile bios at the first step.

The graph gets you to a shortlist.

Why this matters

Competitive follower data becomes more useful when you treat it as category-intent data, not as a flat export.

Use case 4: weekly audience snapshots

IDs are also useful for tracking changes over time.

For a creator CRM or audience analytics product, you can collect weekly snapshots:

snapshot_2026_05_01
snapshot_2026_05_08
snapshot_2026_05_15

Then compare sets:

Snapshot comparison example

function difference(a, b) {
  return new Set([...a].filter((id) =&gt; !b.has(id)));
}

const gained = difference(currentWeekFollowers, lastWeekFollowers);
const lost = difference(lastWeekFollowers, currentWeekFollowers);

console.log({
  gainedFollowers: gained.size,
  lostFollowers: lost.size,
  netChange: gained.size - lost.size,
});

Example output:

{
  "gainedFollowers": 8421,
  "lostFollowers": 1960,
  "netChange": 6461
}

That can help you detect:

which campaign drove new followers,
whether an audience changed after a launch,
whether growth is coming from category-relevant users,
which new followers should be enriched and scored.

Again: IDs first, profiles later.

Use case 5: lead discovery

For B2B lead discovery, the interesting users are rarely "all followers."

They are users matching patterns like:

follows 3+ accounts in your niche,
follows your competitor,
follows a relevant conference,
follows a known category analyst,
recently followed a high-intent account.

One simple scoring approach:

Frequency-scoring example

const audiences = [
  founderAiFollowerIds,
  mlBuilderFollowerIds,
  agentToolsFollowerIds,
  aiConferenceFollowerIds,
];

const score = new Map();

for (const audience of audiences) {
  for (const id of audience) {
    score.set(id, (score.get(id) || 0) + 1);
  }
}

const highIntentIds = [...score.entries()]
  .filter(([, count]) =&gt; count &gt;= 3)
  .map(([id]) =&gt; id);

console.log(`${highIntentIds.length} high-intent accounts found`);

Example output:

9,842 high-intent accounts found

At that point, hydrating fewer than 10,000 users makes more sense than hydrating hundreds of thousands.

Querying overlap in SQL

If you store edges in Postgres, a three-account overlap query can be simple:

SELECT follower_user_id
FROM twitter_follow_edges
WHERE source_user_id IN ('account_a', 'account_b', 'account_c')
GROUP BY follower_user_id
HAVING COUNT(DISTINCT source_user_id) = 3;

For pairwise overlap counts:

SELECT
  a.source_user_id AS account_a,
  b.source_user_id AS account_b,
  COUNT(*) AS shared_followers
FROM twitter_follow_edges a
JOIN twitter_follow_edges b
  ON a.follower_user_id = b.follower_user_id
 AND a.source_user_id < b.source_user_id
GROUP BY a.source_user_id, b.source_user_id
ORDER BY shared_followers DESC;

This is enough for a first version of an overlap dashboard.

When to hydrate profiles

Hydrate profiles when the graph has already narrowed the problem.

Hydrate when...	Why
Users are in the overlap of several target audiences	They likely represent stronger category interest
Users are unique to a high-value creator	They may add incremental reach
Users were newly gained after a campaign	They can explain movement
Users appear across several niche lists	They may be lead candidates
Users are selected for export, scoring, or sales review	The profile data has an immediate use

Bad hydration trigger:

"We collected the ID, so we might as well fetch everything."

That is how small graph experiments turn into large bills.

A small dashboard idea

If I were building a weekend project around this, I would build an audience overlap dashboard:

Input 3-10 Twitter/X usernames.
Pull follower IDs for each account.
Save edges in Postgres.
Compute pairwise overlap, multi-way overlap, unique audience, and repeated followers.
Hydrate only the top segments.
Export CSVs for deeper research.

Example output:

Creator	Followers	Unique followers	Shared with others	Incremental reach score
`@founder_ai`	120,000	81,400	38,600	0.68
`@ml_builder`	85,000	52,900	32,100	0.62
`@agent_tools`	64,000	39,800	24,200	0.62

The dashboard does not need to answer every question.

It just needs to answer the first useful one:

Where is the audience overlap, and which users are worth inspecting next?

Tradeoffs

IDs-first is not always the right default.

Use IDs-first when...	Use full profiles earlier when...
graph structure is the main question	your product needs immediate profile display
lists are large	filtering depends on bio or public metrics
IDs are cheaper than full profiles	you only collect small lists
you can enrich later	your API charges the same for IDs and profiles
you need fast experimentation	compliance rules affect stored relationship data

But when graph structure is the main question, IDs-first is often cleaner and cheaper.

Final takeaway

The bigger lesson is not about one endpoint or one provider.

It is a general data engineering pattern:

Collect cheap relationship data broadly.
Analyze structure first.
Enrich expensive objects selectively.

For follower graphs, that means:

IDs first. Profiles later.

That one design choice can make audience overlap, incremental reach analysis, weekly snapshots, and lead discovery much more practical.

Reference links:

TwitterAPI.io pricing update: https://twitterapi.io/blog/bulk-follower-pricing-90-percent-cheaper
Followers IDs docs: https://docs.twitterapi.io/api-reference/endpoint/get_user_followers_ids
API docs: https://docs.twitterapi.io

DEV Community

IDs First, Profiles Later: A Cheaper Way to Analyze Follower Graphs

In this post

TL;DR

Why full profiles are often premature

The architecture

A concrete cost model

Minimal data model

Collecting follower IDs

Use-case map

Use case 1: audience overlap

Use case 2: incremental reach

Use case 3: competitive audience research

Use case 4: weekly audience snapshots

Use case 5: lead discovery

Querying overlap in SQL

When to hydrate profiles

A small dashboard idea

Tradeoffs

Final takeaway

Top comments (0)