Olamide Olaniyan

Posted on Feb 17

Build a TikTok Audience Demographics Dashboard for Influencer Vetting

#python #webdev #tutorial #programming

You're about to spend $10,000 on an influencer deal. They have 500K followers and their content looks great. But here's the question nobody asks until it's too late:

Are their followers actually your target audience?

A beauty influencer with 80% male followers. A US-focused brand working with a creator whose audience is 90% from countries they don't ship to. A B2B SaaS spending money on a creator whose audience is entirely 13-17 year olds.

These are expensive mistakes. Let's prevent them.

We're building a TikTok Demographics Dashboard that:

Pulls real audience demographics for any TikTok creator
Shows age distribution, gender split, and geographic breakdown
Scores creator-brand fit based on your target audience
Compares multiple influencers side by side

Why Demographics Matter More Than Follower Count

The influencer marketing industry has a dirty secret: follower count tells you almost nothing about ROI.

What actually matters:

Geographic match: Are followers in markets where you operate?
Age alignment: Does the audience match your buyer persona?
Gender fit: Does the demographic match your product's market?
Audience quality: Are these real people or bots?

A 50K creator with perfect demographic alignment will outperform a 500K creator with the wrong audience every single time.

The Stack

Python: Language (with pandas for data analysis)
SociaVault API: TikTok demographics + profile endpoints
Matplotlib: Visualization
python-dotenv: Config

Step 1: Setup

mkdir tiktok-demographics
cd tiktok-demographics
pip install requests pandas matplotlib python-dotenv tabulate

Create .env:

SOCIAVAULT_API_KEY=your_key_here

Step 2: Fetch Demographics

Create demographics.py:

import os
import json
import requests
import pandas as pd
from dotenv import load_dotenv

load_dotenv()

API_BASE = "https://api.sociavault.com"
HEADERS = {"Authorization": f"Bearer {os.getenv('SOCIAVAULT_API_KEY')}"}


def get_demographics(username: str) -> dict:
    """Fetch audience demographics for a TikTok user."""

    print(f"📊 Fetching demographics for @{username}...")

    response = requests.get(
        f"{API_BASE}/v1/scrape/tiktok/demographics",
        params={"username": username},
        headers=HEADERS
    )
    response.raise_for_status()

    data = response.json().get("data", {})

    return data


def get_profile(username: str) -> dict:
    """Get basic profile info for context."""

    response = requests.get(
        f"{API_BASE}/v1/scrape/tiktok/profile",
        params={"username": username},
        headers=HEADERS
    )
    response.raise_for_status()

    return response.json().get("data", {})

Step 3: Parse and Structure Demographics

def parse_demographics(raw_data: dict) -> dict:
    """Normalize demographics data into a clean structure."""

    demo = {
        "age_distribution": {},
        "gender_split": {},
        "top_countries": {},
        "top_cities": {},
        "languages": {},
        "device_types": {},
        "active_hours": {},
    }

    # Age groups
    if "age" in raw_data or "ageDistribution" in raw_data:
        ages = raw_data.get("age") or raw_data.get("ageDistribution", {})
        if isinstance(ages, dict):
            demo["age_distribution"] = ages
        elif isinstance(ages, list):
            for item in ages:
                key = item.get("range") or item.get("group") or item.get("label")
                val = item.get("percentage") or item.get("value") or item.get("share")
                if key and val:
                    demo["age_distribution"][key] = val

    # Gender
    if "gender" in raw_data or "genderDistribution" in raw_data:
        gender = raw_data.get("gender") or raw_data.get("genderDistribution", {})
        if isinstance(gender, dict):
            demo["gender_split"] = gender
        elif isinstance(gender, list):
            for item in gender:
                key = item.get("gender") or item.get("label")
                val = item.get("percentage") or item.get("value")
                if key and val:
                    demo["gender_split"][key] = val

    # Countries
    if "countries" in raw_data or "countryDistribution" in raw_data:
        countries = raw_data.get("countries") or raw_data.get("countryDistribution", [])
        if isinstance(countries, dict):
            demo["top_countries"] = countries
        elif isinstance(countries, list):
            for item in countries:
                key = item.get("country") or item.get("name") or item.get("code")
                val = item.get("percentage") or item.get("value") or item.get("share")
                if key and val:
                    demo["top_countries"][key] = val

    return demo


def print_demographics(username: str, demo: dict, profile: dict = None):
    """Pretty print demographics data."""

    print(f"\n{'═' * 55}")
    print(f"📊 AUDIENCE DEMOGRAPHICS: @{username}")
    print(f"{'═' * 55}")

    if profile:
        followers = profile.get("followers") or profile.get("followerCount", 0)
        following = profile.get("following") or profile.get("followingCount", 0)
        likes = profile.get("likes") or profile.get("heartCount", 0)
        print(f"\n  Followers: {followers:,}")
        print(f"  Following: {following:,}")
        print(f"  Total Likes: {likes:,}")

    # Gender
    if demo["gender_split"]:
        print(f"\n👤 GENDER:")
        print(f"  {'─' * 40}")
        for gender, pct in sorted(demo["gender_split"].items(), key=lambda x: -x[1]):
            bar = '█' * int(pct / 2)
            print(f"  {gender:<10} {pct:>5.1f}% {bar}")

    # Age distribution
    if demo["age_distribution"]:
        print(f"\n📅 AGE DISTRIBUTION:")
        print(f"  {'─' * 40}")
        for age_range, pct in demo["age_distribution"].items():
            bar = '█' * int(pct / 2)
            print(f"  {age_range:<10} {pct:>5.1f}% {bar}")

    # Countries
    if demo["top_countries"]:
        print(f"\n🌍 TOP COUNTRIES:")
        print(f"  {'─' * 40}")
        for country, pct in sorted(demo["top_countries"].items(), key=lambda x: -x[1])[:10]:
            bar = '█' * int(pct / 2)
            print(f"  {country:<15} {pct:>5.1f}% {bar}")

    print()

Step 4: Brand-Creator Fit Scoring

This is the money feature — automatically score how well a creator's audience matches your target:

def calculate_fit_score(
    demo: dict,
    target_countries: list = None,
    target_age_ranges: list = None,
    target_gender: str = None,
    target_gender_min_pct: float = None,
) -> dict:
    """Score how well a creator's audience matches your target."""

    scores = {}

    # Country fit (0-40 points)
    if target_countries and demo["top_countries"]:
        country_match = sum(
            demo["top_countries"].get(c, 0)
            for c in target_countries
        )
        scores["country_fit"] = {
            "score": min(40, country_match * 0.4),
            "max": 40,
            "detail": f"{country_match:.1f}% in target countries ({', '.join(target_countries)})"
        }

    # Age fit (0-30 points)
    if target_age_ranges and demo["age_distribution"]:
        age_match = sum(
            demo["age_distribution"].get(r, 0)
            for r in target_age_ranges
        )
        scores["age_fit"] = {
            "score": min(30, age_match * 0.3),
            "max": 30,
            "detail": f"{age_match:.1f}% in target age ranges ({', '.join(target_age_ranges)})"
        }

    # Gender fit (0-30 points)
    if target_gender and demo["gender_split"]:
        gender_pct = demo["gender_split"].get(target_gender, 0)
        min_threshold = target_gender_min_pct or 40

        if gender_pct >= min_threshold:
            gender_score = 30
        elif gender_pct >= min_threshold * 0.7:
            gender_score = 20
        elif gender_pct >= min_threshold * 0.5:
            gender_score = 10
        else:
            gender_score = 5

        scores["gender_fit"] = {
            "score": gender_score,
            "max": 30,
            "detail": f"{gender_pct:.1f}% {target_gender} (target: {min_threshold}%+)"
        }

    # Calculate total
    total_score = sum(s["score"] for s in scores.values())
    max_score = sum(s["max"] for s in scores.values())

    percentage = (total_score / max_score * 100) if max_score > 0 else 0

    # Grade
    if percentage >= 80:
        grade = "A"
        verdict = "Excellent fit"
    elif percentage >= 65:
        grade = "B"
        verdict = "Good fit"
    elif percentage >= 50:
        grade = "C"
        verdict = "Moderate fit — proceed with caution"
    elif percentage >= 35:
        grade = "D"
        verdict = "Poor fit — likely low ROI"
    else:
        grade = "F"
        verdict = "Bad fit — audience mismatch"

    return {
        "scores": scores,
        "total": total_score,
        "max": max_score,
        "percentage": percentage,
        "grade": grade,
        "verdict": verdict,
    }


def print_fit_report(username: str, fit: dict):
    """Print the fit score report."""

    grade_colors = {"A": "🟢", "B": "🟢", "C": "🟡", "D": "🔴", "F": "🔴"}
    icon = grade_colors.get(fit["grade"], "⚪")

    print(f"\n{'═' * 55}")
    print(f"🎯 BRAND FIT SCORE: @{username}")
    print(f"{'═' * 55}")

    print(f"\n  {icon} Grade: {fit['grade']} ({fit['percentage']:.0f}/100)")
    print(f"  📋 Verdict: {fit['verdict']}")

    print(f"\n  Score Breakdown:")
    print(f"  {'─' * 45}")
    for name, score in fit["scores"].items():
        bar_filled = int(score["score"] / score["max"] * 20)
        bar = '█' * bar_filled + '░' * (20 - bar_filled)
        label = name.replace('_', ' ').title()
        print(f"  {label:<15} [{bar}] {score['score']:.0f}/{score['max']}")
        print(f"  {'':15} {score['detail']}")

    print()

Step 5: Compare Multiple Creators

def compare_creators(
    usernames: list,
    target_countries: list = None,
    target_age_ranges: list = None,
    target_gender: str = None,
) -> pd.DataFrame:
    """Compare demographics and fit scores across multiple creators."""

    print(f"\n📊 Comparing {len(usernames)} creators...\n")

    results = []

    for username in usernames:
        try:
            profile = get_profile(username)
            raw_demo = get_demographics(username)
            demo = parse_demographics(raw_demo)

            fit = calculate_fit_score(
                demo,
                target_countries=target_countries,
                target_age_ranges=target_age_ranges,
                target_gender=target_gender,
            )

            followers = profile.get("followers") or profile.get("followerCount", 0)
            eng_rate = profile.get("engagementRate", 0)

            results.append({
                "Creator": f"@{username}",
                "Followers": followers,
                "Eng Rate": f"{eng_rate:.2f}%" if eng_rate else "N/A",
                "Fit Grade": fit["grade"],
                "Fit Score": f"{fit['percentage']:.0f}/100",
                "Top Country": max(demo["top_countries"].items(), key=lambda x: x[1])[0] if demo["top_countries"] else "N/A",
                "Top Age": max(demo["age_distribution"].items(), key=lambda x: x[1])[0] if demo["age_distribution"] else "N/A",
            })

            print(f"  ✓ @{username}: Grade {fit['grade']} ({fit['percentage']:.0f}/100)")

        except Exception as e:
            print(f"  ✗ @{username}: {e}")
            results.append({"Creator": f"@{username}", "Error": str(e)})

    df = pd.DataFrame(results)

    print(f"\n{'═' * 70}")
    print(f"📊 CREATOR COMPARISON")
    print(f"{'═' * 70}")
    print(df.to_string(index=False))

    return df

Step 6: Visualization

def visualize_demographics(username: str, demo: dict, output_file: str = None):
    """Generate visual charts from demographics data."""

    import matplotlib.pyplot as plt

    fig, axes = plt.subplots(1, 3, figsize=(16, 5))
    fig.suptitle(f"@{username} — Audience Demographics", fontsize=14, fontweight='bold')

    # Age distribution
    if demo["age_distribution"]:
        ax = axes[0]
        ages = dict(sorted(demo["age_distribution"].items()))
        ax.barh(list(ages.keys()), list(ages.values()), color='#6366f1')
        ax.set_xlabel("Percentage")
        ax.set_title("Age Distribution")
        for i, v in enumerate(ages.values()):
            ax.text(v + 0.5, i, f"{v:.1f}%", va='center', fontsize=9)

    # Gender split
    if demo["gender_split"]:
        ax = axes[1]
        colors = {'Male': '#3b82f6', 'Female': '#ec4899', 'Other': '#8b5cf6',
                  'male': '#3b82f6', 'female': '#ec4899'}
        genders = demo["gender_split"]
        vals = list(genders.values())
        labels = [f"{k}\n{v:.1f}%" for k, v in genders.items()]
        ax.pie(vals, labels=labels,
               colors=[colors.get(k, '#94a3b8') for k in genders.keys()],
               startangle=90, textprops={'fontsize': 10})
        ax.set_title("Gender Split")

    # Top countries
    if demo["top_countries"]:
        ax = axes[2]
        countries = dict(sorted(demo["top_countries"].items(), key=lambda x: -x[1])[:8])
        ax.barh(list(countries.keys()), list(countries.values()), color='#10b981')
        ax.set_xlabel("Percentage")
        ax.set_title("Top Countries")
        for i, v in enumerate(countries.values()):
            ax.text(v + 0.3, i, f"{v:.1f}%", va='center', fontsize=9)

    plt.tight_layout()

    if output_file:
        plt.savefig(output_file, dpi=150, bbox_inches='tight')
        print(f"  📈 Chart saved: {output_file}")
    else:
        plt.show()


def generate_full_report(username: str, target_config: dict = None):
    """Generate a complete demographics report with fit scoring."""

    profile = get_profile(username)
    raw_demo = get_demographics(username)
    demo = parse_demographics(raw_demo)

    print_demographics(username, demo, profile)

    if target_config:
        fit = calculate_fit_score(demo, **target_config)
        print_fit_report(username, fit)

    visualize_demographics(username, demo, f"{username}_demographics.png")

    return {"profile": profile, "demographics": demo}

Step 7: CLI

def main():
    import sys

    if len(sys.argv) < 2:
        print("TikTok Demographics Dashboard\n")
        print("Usage:")
        print("  python demographics.py check <username>")
        print("  python demographics.py fit <username> --countries US,UK --ages 18-24,25-34 --gender Female")
        print("  python demographics.py compare user1,user2,user3")
        print()
        print("Note: Demographics costs 20-39 credits per lookup.")
        print("Use wisely — vet shortlisted creators, not your entire list.")
        return

    command = sys.argv[1]

    if command == "check":
        username = sys.argv[2].lstrip("@")
        generate_full_report(username)

    elif command == "fit":
        username = sys.argv[2].lstrip("@")
        config = {}

        if "--countries" in sys.argv:
            idx = sys.argv.index("--countries")
            config["target_countries"] = sys.argv[idx + 1].split(",")

        if "--ages" in sys.argv:
            idx = sys.argv.index("--ages")
            config["target_age_ranges"] = sys.argv[idx + 1].split(",")

        if "--gender" in sys.argv:
            idx = sys.argv.index("--gender")
            config["target_gender"] = sys.argv[idx + 1]

        generate_full_report(username, target_config=config)

    elif command == "compare":
        usernames = sys.argv[2].split(",")
        usernames = [u.strip().lstrip("@") for u in usernames]

        config = {}
        if "--countries" in sys.argv:
            idx = sys.argv.index("--countries")
            config["target_countries"] = sys.argv[idx + 1].split(",")
        if "--ages" in sys.argv:
            idx = sys.argv.index("--ages")
            config["target_age_ranges"] = sys.argv[idx + 1].split(",")
        if "--gender" in sys.argv:
            idx = sys.argv.index("--gender")
            config["target_gender"] = sys.argv[idx + 1]

        compare_creators(usernames, **config)


if __name__ == "__main__":
    main()

Running It

# Quick demographics check
python demographics.py check charlidamelio

# Fit scoring for a DTC beauty brand targeting US women 18-34
python demographics.py fit charlidamelio --countries US,CA --ages 18-24,25-34 --gender Female

# Compare 3 shortlisted creators
python demographics.py compare "creator1,creator2,creator3" --countries US --ages 18-24,25-34

A Note on Credits

Demographics lookups cost 20-39 credits per creator. This is not a tool for scanning thousands of creators. Use it strategically:

Build a shortlist using profile data (1 credit each)
Check engagement rates and content quality manually
Only run demographics on your top 5-10 finalists

Think of it like a detailed background check. You don't run it on every applicant — just the ones you're seriously considering.

The ROI Math

Let's say you're evaluating 5 influencers for a $5,000 campaign:

Scenario	Demographics Cost	Campaign Savings
Without vetting	$0	Risk $5,000 on wrong audience
With vetting	~$3-5	Confirm audience match first

Even one avoided misfire pays for thousands of demographics lookups.

Cost Comparison

Tool	Monthly Cost	TikTok Demographics
HypeAuditor	$299/mo	Estimated (not real data)
Modash	$120/mo	Sample-based
CreatorIQ	Enterprise pricing	Requires onboarding
This tool	~$0.20-0.40/lookup	On-demand, pay per use

The key difference: most platforms show estimated demographics based on content analysis. The SociaVault endpoint pulls from TikTok's own analytics, which means the data is as accurate as what the creator sees in their own dashboard.

Get Started

Get your API key at sociavault.com
Pick a creator you've been considering
Run the demographics check before you sign the deal

Ten seconds of vetting can save you thousands in wasted influencer spend.

Follower count is vanity. Demographics are strategy. Check the audience before you pay the creator.

python #tiktok #influencermarketing #datascience

DEV Community