DEV Community

Olga Braginskaya
Olga Braginskaya Subscriber

Posted on • Originally published at datobra.com on

Build in Public: Week 7. Shipping While Tired (or: Still Alive, Surprisingly)

This week was quiet.

Not because nothing is happening, but because after six weeks of pushing, adding platforms, adding metrics, debugging queues and arguing with reality, we’re tired. The kind of tired where you stop dreaming about new features and start dreaming about “everything still works tomorrow”.

And honestly that’s fine.

Week 7 ended up being a regrouping week. Less invention, more wiring things into something that feels like a system. The main theme was: if Wykra is about finding creators and understanding what’s real vs fake, then we need to look beyond follower counts and start inspecting the messier layer: comments. So we added suspicious comment analysis for both Instagram and TikTok.

The flow is the usual Wykra pattern: you submit a request, get a task id and the worker goes off to analyze a handful of recent posts/videos and comes back with structured results.

Instagram: suspicious comments

Request example:

curl --location 'http://localhost:3011/api/v1/instagram/profile/comments/suspicious' \
  --data '{ "profile": "annascooking_" }'
Enter fullscreen mode Exit fullscreen mode

In this case the analysis found a very “classic internet” situation:

  • a single user dumping a burst of random Cyrillic characters in under a minute (high confidence bot/spam)
  • a smaller amount of generic emoji-only noise
  • overall: mostly organic discussion, with one very obvious spam cluster
"analysis": {
    "summary": "Analysis reveals several concerning patterns in the comment section. The most notable is a series of suspicious comments from user 'klepaskolia' who posted 14 consecutive comments containing seemingly random Cyrillic characters within a one-minute timespan, strongly indicating bot or automated spam activity. The majority of authentic comments appear to be in Polish and are food-related, making these outlier comments particularly conspicuous. Beyond the obvious spam cluster, the engagement patterns appear largely organic with normal food-related discussions and reactions. The comments show natural language variations and authentic interactions between users, with genuine questions about recipes and cooking techniques. While the spam incident is significant, it appears to be isolated to a single user and timeframe.",
    "suspiciousCount": 15,
    "suspiciousPercentage": 21.4,
    "riskLevel": "medium",
    "patterns": [
      {
        "type": "bot",
        "description": "Rapid-fire comments with random Cyrillic characters from single user",
        "examples": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
        "severity": "high"
      },
      {
        "type": "spam",
        "description": "Generic emoji-only comments with no context",
        "examples": [20, 28, 37, 42],
        "severity": "low"
      }
    ],
    "suspiciousComments": [
      {
        "commentIndex": 2,
        "commentId": "17981987252931776",
        "reason": "Part of automated spam sequence with random characters",
        "riskScore": 9
      },
      {
        "commentIndex": 13,
        "commentId": "18114796114575395",
        "reason": "Longest spam comment in sequence, random character string",
        "riskScore": 9
      },
      {
        "commentIndex": 37,
        "commentId": "17886914691398195",
        "reason": "Generic emoji-only comment with no context",
        "riskScore": 3
      }
    ],
    "recommendations": "Implement rate limiting to prevent rapid-fire commenting from single users. Add automated detection for comments containing random character strings, particularly in non-native alphabets. Consider requiring minimum character counts for comments to reduce low-effort emoji-only spam. Monitor accounts that post multiple times within very short time windows."
  }
Enter fullscreen mode Exit fullscreen mode

Under the hood this starts with scraping real Instagram comments using Bright Data’s Instagram comments scraper. We first fetch the profile, extract a few recent post URLs and then collect comments for those posts.

Once we have the raw comments, we pass them to an LLM (Claude 3.5 Sonnet) with a fairly opinionated prompt. The goal isn’t to magically label comments as “fake”, but to ask very specific questions:

  • does this look like spam?
  • does this look automated?
  • are there weird engagement patterns?
  • are the same accounts behaving suspiciously across multiple comments?

The important part is that this works the same way across platforms. We scrape real data, normalize it, then ask the model to reason about patterns rather than individual comments in isolation.

If you’re curious about the actual prompts we’re using, they’re all in the repo.

TikTok: suspicious comments

Example:

curl --location 'http://localhost:3011/api/v1/tiktok/profile/comments/suspicious' \
  --data '{ "profile": "ddlovato" }'

Enter fullscreen mode Exit fullscreen mode

This one was interesting in a different way. The comments were mostly real fan engagement, but the suspicious layer looked more like:

  • “free tickets / VIP access” patterns (scam-ish or at least engagement bait)
  • generic low-effort bot-shaped comments
  • some weird engagement distribution (generic comments getting unusually high likes)
"analysis": {
    "summary": "Analysis of the 150 comments reveals some concerning patterns of potential engagement manipulation and suspicious activity, though most comments appear authentic. The most notable suspicious pattern involves comments asking for free tickets/VIP access, which could be attempts to scam or manipulate engagement. There are also several generic, low-effort comments that show patterns consistent with bot activity. However, the majority of engagement appears organic, with fans expressing genuine enthusiasm about concerts, music, and personal connections to the artist. The high proportion of emoji usage and personalized responses suggests a largely authentic fan community. The presence of comments in multiple languages (English, Portuguese, Spanish) with consistent engagement patterns further supports genuine international fan interaction.",
    "suspiciousCount": 12,
    "suspiciousPercentage": 8,
    "riskLevel": "low",
    "patterns": [
      {
        "type": "ticket_scam",
        "description": "Multiple comments asking for free tickets/VIP access in similar patterns",
        "examples": [52, 77, 80],
        "severity": "medium"
      },
      {
        "type": "bot_activity",
        "description": "Very short, generic comments with minimal engagement",
        "examples": [22, 38, 18],
        "severity": "low"
      },
      {
        "type": "engagement_manipulation",
        "description": "Unusually high likes on generic comments compared to more substantive ones",
        "examples": [51, 52, 53],
        "severity": "medium"
      }
    ],
    "suspiciousComments": [
      {
        "commentIndex": 52,
        "commentId": "7584231130246038286",
        "reason": "Suspicious request for free VIP ticket with unusually high engagement",
        "riskScore": 7
      },
      {
        "commentIndex": 77,
        "commentId": "7584268771855434514",
        "reason": "Similar pattern of requesting free VIP ticket",
        "riskScore": 6
      },
      {
        "commentIndex": 22,
        "commentId": "7584891712965100296",
        "reason": "Extremely generic single-word comment with no context",
        "riskScore": 4
      }
    ],
    "recommendations": "Implement automated filtering for ticket request scams and monitor unusual engagement spikes on generic comments. Consider adding verification requirements for high-engagement comments and maintaining current emoji-friendly environment as it encourages authentic interaction. Monitor but don't restrict international language comments as they appear to represent genuine fan engagement."
  }
Enter fullscreen mode Exit fullscreen mode

Under the hood this follows the same pattern as Instagram.

We scrape real TikTok data using Bright Data’s TikTok datasets: first the profile, then a handful of recent videos and then comments for each video. Everything runs asynchronously through the same task + worker flow as the rest of Wykra.

Once the comments are collected, we pass them to an LLM (Claude 3.5 Sonnet) with one important constraint: comments containing emojis are treated as normal, authentic engagement and explicitly excluded from suspicious analysis. That small rule turns out to matter a lot on TikTok.

Instead of flagging half the comment section as “low quality”, the analysis focuses on patterns that actually look off: repeated scams, automation-shaped behavior or engagement that doesn’t match the content.

Just like on Instagram the output is structured: a short summary, a risk level, detected patterns, concrete examples and recommendations. The goal isn’t to judge creators, but to separate real engagement from noise.

TikTok: profile analysis

We also now have TikTok profile analysis as a separate endpoint:

curl --location 'http://localhost:3011/api/v1/tiktok/profile' \
--data '{ "profile": "ddlovato" }'
Enter fullscreen mode Exit fullscreen mode

This returns the boring but necessary foundation: followers, likes, engagement rates, verification, top videos, posting patterns, plus a sanity-check style analysis (does this look authentic, what niche/topic, visible brands, etc.).

"analysis": {
    "summary": "This is clearly Demi Lovato's official TikTok profile, showing strong performance metrics with 8.2M followers and 146.7M total likes. The bio indicates active music promotion ('IT'S NOT THAT DEEP') and tour marketing, aligning with their status as a major recording artist.",
    "qualityScore": 5,
    "topic": "Music & Entertainment",
    "niche": "Pop Music Artist",
    "sponsoredFrequency": "low",
    "contentAuthenticity": "authentic",
    "followerAuthenticity": "likely real",
    "visibleBrands": [
      "Self-branded music content",
      "Tour promotions"
    ],
    "engagementStrength": "strong",
    "postsAnalysis": "Content focuses on music promotion, behind-the-scenes content, and personal moments. High likes relative to follower count indicate strong engagement.",
    "hashtagsStatistics": "Limited hashtag usage, typical of verified celebrity accounts relying on organic reach."
  }
Enter fullscreen mode Exit fullscreen mode

At this point we have enough building blocks: profile analysis, comment analysis, suspicious pattern detection, the same async task flow across platforms. Individually, they work. But taken one by one, they also make it easy to lose sight of why we’re building this in the first place.

That’s probably why this week felt so exhausting.

When everything lives behind localhost, the project starts to feel abstract. You’re improving parts, but you’re no longer seeing the whole. So we’re taking this as a signal that it’s time to deploy.

Not because Wykra is “done”, but because we need to see it running as an actual system, in front of real users, and get feedback that isn’t coming from our own assumptions or test profiles.

Still tired. Still going.

If you want to support the project, ⭐️ the repo and follow me on X - it really helps.

Repo: https://github.com/wykra-io/wykra-api

Twitter/X: https://x.com/ohthatdatagirl

Blog: https://www.datobra.com/

Top comments (2)

Collapse
 
iscander01 profile image
Novea Madundo

This project is a perfect example of iterative improvement in the wild — love the in-depth public documentation

Collapse
 
zlayaboroda_3f2ffdd0279ac profile image
ZlayaBoroda

I love the transparency in sharing the messy middle — it really shows the reality of building in public