Most audience-analysis projects start with one deceptively simple question:
Which people are shared across these audiences, and which people are unique?
That question sounds like marketing.
In practice, it is a data engineering problem.
If you are analyzing creators, competitors, communities, conferences, or niche accounts, the expensive mistake is usually the same:
You hydrate every profile too early.
You pull usernames, bios, avatars, follower counts, descriptions, and other profile fields before you know which users are actually worth inspecting.
For many social graph workflows, the better pattern is:
IDs first. Profiles later.
Collect the graph as raw IDs. Run set operations. Find the interesting segments. Hydrate only the users that survive that first pass.
This post walks through that pattern using Twitter/X-style follower data as the example.
The same idea applies to any API or dataset where:
- ID-only edges are cheaper than full objects,
- relationships matter more than profile fields at first,
- you can enrich selected IDs later,
- and your real goal is overlap, uniqueness, change detection, or lead scoring.
Core pattern
Collect cheap relationship data broadly, analyze structure first, and enrich expensive objects selectively.
A follower list is less useful as a list, and more useful as a relationship map you can query.
In this post
- The core idea
- Why full profiles are often premature
- The architecture
- A concrete cost model
- Minimal data model
- Collecting follower IDs
- Use cases
- SQL overlap queries
- Tradeoffs
TL;DR
| Step | What to do | Why it helps |
|---|---|---|
| 1 | Pull follower IDs | Collect graph edges cheaply |
| 2 | Store raw relationships | Keep the data model simple |
| 3 | Run set operations | Find overlap, uniqueness, and repeated appearances |
| 4 | Pick useful segments | Avoid enriching low-signal users |
| 5 | Hydrate profiles later | Spend API calls only where they matter |
Rule of thumb
If your first question is about relationships, do not start by fetching profile metadata.
This reduces cost, makes experiments faster, and keeps your data model cleaner.
Why full profiles are often premature
A full follower profile is useful when you need to display or filter by:
- username,
- display name,
- bio,
- avatar,
- follower count,
- location,
- public metrics.
But many first-pass questions do not need any of those fields.
For example:
- Which users follow both creator A and creator B?
- Which followers are shared by three competitors?
- Which creator has the most incremental reach?
- Which accounts appeared in this week's snapshot but not last week's?
- Which IDs show up across multiple high-intent communities?
Those questions need relationships, not bios.
Once you treat the problem as graph analysis, the data you need first is much smaller:
source_account_id -> follower_user_id
That edge is enough to compute a surprising amount of value.
The architecture
Think of the pipeline in three stages:
| Stage | Input | Output |
|---|---|---|
| Collect IDs | Target accounts |
source_account_id -> follower_user_id edges |
| Analyze graph | Raw ID lists | overlap, uniqueness, frequency, growth, decay |
| Hydrate selected profiles | High-signal IDs | usernames, bios, metrics, exports |
Get followers -> store edges -> find signals -> hydrate later
That separation is the whole trick.
It keeps graph collection cheap and broad, while keeping profile enrichment focused and narrow.
The graph tells you who is worth inspecting before you pay to hydrate full profiles.
A concrete cost model
To make the math less abstract, here is one real pricing example.
TwitterAPI.io added an IDs-only follower endpoint in May 2026. Their docs show that /twitter/user/followers_ids can return up to 5,000 follower IDs per call, with the largest tier priced at 0.45 credits per ID.
They also document:
100,000 credits = $1
1 credit = $0.00001
So a 5,000-ID page costs:
5,000 IDs * 0.45 credits = 2,250 credits
2,250 credits = $0.0225
Now compare three ways to map an account with 50,000 followers:
| Approach | What you collect | Approx cost |
|---|---|---|
| Old full-profile pull | 50,000 full profiles at 15 credits each | $7.50 |
| New full-profile pull | 50,000 full profiles at 1 credit each | $0.50 |
| IDs-first pull | 50,000 raw follower IDs | $0.225 |
The exact numbers will depend on your provider and plan. The point is the shape of the economics:
If IDs are cheaper than profiles, do graph work before profile work.
Cost-model assumptions
This example uses one public API pricing model as a concrete reference point. The important takeaway is not the exact vendor or exact number. The important takeaway is the relationship between cheap edge collection and more expensive object hydration.
If your provider prices IDs and full profiles differently, model the same workflow in three columns:
| Question | What to estimate |
|---|---|
| How many IDs do I need? | target accounts * followers per account |
| How many profiles do I actually inspect? | overlap segment, unique segment, high-frequency segment |
| What happens if I hydrate everything first? | total IDs * full-profile price |
Minimal data model
You do not need a graph database on day one.
A relational table can take you far:
CREATE TABLE twitter_follow_edges (
source_user_id TEXT NOT NULL,
follower_user_id TEXT NOT NULL,
collected_at TIMESTAMP NOT NULL,
PRIMARY KEY (source_user_id, follower_user_id)
);
Use TEXT for IDs. Twitter/X IDs can exceed JavaScript's safe integer range, and many APIs return them as strings for that reason.
You can add hydrated profile data separately:
CREATE TABLE twitter_user_profiles (
user_id TEXT PRIMARY KEY,
username TEXT,
display_name TEXT,
bio TEXT,
followers_count INTEGER,
hydrated_at TIMESTAMP NOT NULL
);
Keeping edges and profiles separate makes the pipeline easier to reason about:
- edge collection answers graph questions,
- profile hydration answers identity and filtering questions.
Design note
Keep relationship collection and profile enrichment as separate jobs. That makes retries, deduplication, cost tracking, and data retention much easier to reason about.
Collecting follower IDs
A typical IDs-only request looks like this:
curl "https://api.twitterapi.io/twitter/user/followers_ids?userName=elonmusk&count=5000" \
-H "X-API-Key: YOUR_API_KEY"
The important detail is pagination.
Store next_cursor and keep fetching until the response says there are no more pages.
JavaScript collector example
async function collectFollowerIds(userName) {
let cursor = "";
const ids = [];
while (true) {
const url = new URL("https://api.twitterapi.io/twitter/user/followers_ids");
url.searchParams.set("userName", userName);
url.searchParams.set("count", "5000");
if (cursor) {
url.searchParams.set("cursor", cursor);
}
const response = await fetch(url, {
headers: {
"X-API-Key": process.env.TWITTERAPI_IO_KEY,
},
});
if (!response.ok) {
throw new Error(`Request failed: ${response.status}`);
}
const data = await response.json();
ids.push(...data.ids);
if (!data.has_next_page || !data.next_cursor || data.next_cursor === "0") {
break;
}
cursor = data.next_cursor;
}
return ids;
}
Production checklist
- retry with exponential backoff,
- request logging,
- rate-limit handling,
- cursor checkpointing,
- deduplication before writing,
- a
collection_job_idfor traceability.
But the core loop is small.
Use-case map
| Use case | Graph question | First segment to hydrate |
|---|---|---|
| Audience overlap | Who follows multiple accounts? | shared followers |
| Incremental reach | Who is unique to one creator? | unique followers |
| Competitive research | Who follows several tools in a category? | multi-competitor followers |
| Weekly snapshots | Who appeared or disappeared? | newly gained or lost followers |
| Lead discovery | Who repeats across high-intent lists? | high-frequency IDs |
The examples below all use the same principle: find the signal with IDs first, then enrich the segment that matters.
Use case 1: audience overlap
Suppose you are researching three AI-focused accounts:
@founder_ai
@ml_builder
@agent_tools
Toy dataset:
| Account | Followers collected |
|---|---|
@founder_ai |
120,000 |
@ml_builder |
85,000 |
@agent_tools |
64,000 |
Once you have ID lists, overlap is just set math:
Set-intersection example
const founderAI = new Set(founderAiFollowerIds);
const mlBuilder = new Set(mlBuilderFollowerIds);
const agentTools = new Set(agentToolsFollowerIds);
function intersection(a, b) {
return new Set([...a].filter((id) => b.has(id)));
}
function threeWayIntersection(a, b, c) {
return new Set([...a].filter((id) => b.has(id) && c.has(id)));
}
const founderAndML = intersection(founderAI, mlBuilder);
const founderAndAgent = intersection(founderAI, agentTools);
const mlAndAgent = intersection(mlBuilder, agentTools);
const allThree = threeWayIntersection(founderAI, mlBuilder, agentTools);
console.log({
founderAndML: founderAndML.size,
founderAndAgent: founderAndAgent.size,
mlAndAgent: mlAndAgent.size,
allThree: allThree.size,
});
Example output:
{
"founderAndML": 18420,
"founderAndAgent": 12380,
"mlAndAgent": 10955,
"allThree": 6210
}
The 6,210 users following all three accounts are probably more interesting than the average follower.
That is the segment I would hydrate first.
What this gives you
A smaller, higher-intent segment that is easier to inspect, export, score, or enrich.
Use case 2: incremental reach
Follower count alone is a weak metric.
If two creators have the same audience, buying both sponsorships may not add much reach. A smaller creator with a more unique audience can be more valuable.
You can estimate uniqueness like this:
Unique-audience example
function uniqueToA(a, b, c) {
return new Set([...a].filter((id) => !b.has(id) && !c.has(id)));
}
const uniqueFounderAI = uniqueToA(founderAI, mlBuilder, agentTools);
const uniqueMLBuilder = uniqueToA(mlBuilder, founderAI, agentTools);
const uniqueAgentTools = uniqueToA(agentTools, founderAI, mlBuilder);
console.log({
uniqueFounderAI: uniqueFounderAI.size,
uniqueMLBuilder: uniqueMLBuilder.size,
uniqueAgentTools: uniqueAgentTools.size,
});
Example output:
{
"uniqueFounderAI": 81400,
"uniqueMLBuilder": 52900,
"uniqueAgentTools": 39800
}
Now you can rank creators by incremental reach, not just total followers.
Use case 3: competitive audience research
For competitive research, graph data can tell you how users relate to a category.
Imagine you are building a SaaS product for sales teams. You collect follower IDs for:
@crm_tool_a
@sales_ai_b
@pipeline_app_c
Then segment users by relationship:
| Segment | Possible interpretation |
|---|---|
| Follows only competitor A | Possibly loyal to one tool |
| Follows A and B | Comparing alternatives |
| Follows all competitors | Strong category interest |
| Follows competitor + analyst | Higher-intent audience |
| Follows competitor but not your brand | Potential acquisition audience |
None of this requires profile bios at the first step.
The graph gets you to a shortlist.
Why this matters
Competitive follower data becomes more useful when you treat it as category-intent data, not as a flat export.
Use case 4: weekly audience snapshots
IDs are also useful for tracking changes over time.
For a creator CRM or audience analytics product, you can collect weekly snapshots:
snapshot_2026_05_01
snapshot_2026_05_08
snapshot_2026_05_15
Then compare sets:
Snapshot comparison example
function difference(a, b) {
return new Set([...a].filter((id) => !b.has(id)));
}
const gained = difference(currentWeekFollowers, lastWeekFollowers);
const lost = difference(lastWeekFollowers, currentWeekFollowers);
console.log({
gainedFollowers: gained.size,
lostFollowers: lost.size,
netChange: gained.size - lost.size,
});
Example output:
{
"gainedFollowers": 8421,
"lostFollowers": 1960,
"netChange": 6461
}
That can help you detect:
- which campaign drove new followers,
- whether an audience changed after a launch,
- whether growth is coming from category-relevant users,
- which new followers should be enriched and scored.
Again: IDs first, profiles later.
Use case 5: lead discovery
For B2B lead discovery, the interesting users are rarely "all followers."
They are users matching patterns like:
- follows 3+ accounts in your niche,
- follows your competitor,
- follows a relevant conference,
- follows a known category analyst,
- recently followed a high-intent account.
One simple scoring approach:
Frequency-scoring example
const audiences = [
founderAiFollowerIds,
mlBuilderFollowerIds,
agentToolsFollowerIds,
aiConferenceFollowerIds,
];
const score = new Map();
for (const audience of audiences) {
for (const id of audience) {
score.set(id, (score.get(id) || 0) + 1);
}
}
const highIntentIds = [...score.entries()]
.filter(([, count]) => count >= 3)
.map(([id]) => id);
console.log(`${highIntentIds.length} high-intent accounts found`);
Example output:
9,842 high-intent accounts found
At that point, hydrating fewer than 10,000 users makes more sense than hydrating hundreds of thousands.
Querying overlap in SQL
If you store edges in Postgres, a three-account overlap query can be simple:
SELECT follower_user_id
FROM twitter_follow_edges
WHERE source_user_id IN ('account_a', 'account_b', 'account_c')
GROUP BY follower_user_id
HAVING COUNT(DISTINCT source_user_id) = 3;
For pairwise overlap counts:
SELECT
a.source_user_id AS account_a,
b.source_user_id AS account_b,
COUNT(*) AS shared_followers
FROM twitter_follow_edges a
JOIN twitter_follow_edges b
ON a.follower_user_id = b.follower_user_id
AND a.source_user_id < b.source_user_id
GROUP BY a.source_user_id, b.source_user_id
ORDER BY shared_followers DESC;
This is enough for a first version of an overlap dashboard.
When to hydrate profiles
Hydrate profiles when the graph has already narrowed the problem.
| Hydrate when... | Why |
|---|---|
| Users are in the overlap of several target audiences | They likely represent stronger category interest |
| Users are unique to a high-value creator | They may add incremental reach |
| Users were newly gained after a campaign | They can explain movement |
| Users appear across several niche lists | They may be lead candidates |
| Users are selected for export, scoring, or sales review | The profile data has an immediate use |
Bad hydration trigger:
"We collected the ID, so we might as well fetch everything."
That is how small graph experiments turn into large bills.
A small dashboard idea
If I were building a weekend project around this, I would build an audience overlap dashboard:
- Input 3-10 Twitter/X usernames.
- Pull follower IDs for each account.
- Save edges in Postgres.
- Compute pairwise overlap, multi-way overlap, unique audience, and repeated followers.
- Hydrate only the top segments.
- Export CSVs for deeper research.
Example output:
| Creator | Followers | Unique followers | Shared with others | Incremental reach score |
|---|---|---|---|---|
@founder_ai |
120,000 | 81,400 | 38,600 | 0.68 |
@ml_builder |
85,000 | 52,900 | 32,100 | 0.62 |
@agent_tools |
64,000 | 39,800 | 24,200 | 0.62 |
The dashboard does not need to answer every question.
It just needs to answer the first useful one:
Where is the audience overlap, and which users are worth inspecting next?
Tradeoffs
IDs-first is not always the right default.
| Use IDs-first when... | Use full profiles earlier when... |
|---|---|
| graph structure is the main question | your product needs immediate profile display |
| lists are large | filtering depends on bio or public metrics |
| IDs are cheaper than full profiles | you only collect small lists |
| you can enrich later | your API charges the same for IDs and profiles |
| you need fast experimentation | compliance rules affect stored relationship data |
But when graph structure is the main question, IDs-first is often cleaner and cheaper.
Final takeaway
The bigger lesson is not about one endpoint or one provider.
It is a general data engineering pattern:
Collect cheap relationship data broadly.
Analyze structure first.
Enrich expensive objects selectively.
For follower graphs, that means:
IDs first. Profiles later.
That one design choice can make audience overlap, incremental reach analysis, weekly snapshots, and lead discovery much more practical.
Reference links:
- TwitterAPI.io pricing update: https://twitterapi.io/blog/bulk-follower-pricing-90-percent-cheaper
- Followers IDs docs: https://docs.twitterapi.io/api-reference/endpoint/get_user_followers_ids
- API docs: https://docs.twitterapi.io


Top comments (0)