Last week was about observability. We added metrics and dashboards so we could see what the system was actually doing instead of relying on intuition.
So this week wasn’t about inventing something new from scratch. It was about answering a very practical question: can we extend the same idea to other social platforms without the whole system falling apart?
Short answer: partially.
TikTok: Similar Problem, Different Shape
Quick reminder: Wykra is built to answer a very human question: “can you find me creators like this?” You describe the influencer you need and we go look for them. We already do this for Instagram. This week we tried to reuse the same pattern for other platforms starting with TikTok.
At a high level TikTok search follows the same story arc as Instagram: you send a free-text request like “Find up to 15 public TikTok creators from Portugal who post about baking or sourdough bread” and the API immediately hands you back a task id while the real work happens in the background. A worker picks up that task, turns your sentence into structured search parameters, runs a Bright Data dataset scrapper, scores the discovered profiles with an LLM, filters out the useless ones, and finally stores everything so you can fetch the results from /tasks/:id later.
The first step is still “vibe in, JSON out”. We send the original query to an LLM and ask it to extract a small context object: what niche this is about, which location it mentions, a normalized country code, an optional target number of creators and a few short phrases that could go straight into the TikTok search box. If the model cannot even agree on a category, we stop there instead of pretending we know what to search for. Once the context is ready, we build up to three search terms, pick a country (either from the context or defaulting to US) and move on.
This is where TikTok diverges from Instagram. For Instagram we have to use Perplexity to discover profiles first and only then enrich them. TikTok, thanks to having a proper keyword search in the dataset, lets us skip that extra step. For each search term we generate a TikTok search URL, trigger the Bright Data TikTok dataset with that URL and country, poll until the snapshot is ready, download the JSON and then merge and deduplicate all profiles by their profile URL. The whole thing can take a while, so it lives as a long-running async job inside the same generic Task system we already use elsewhere.
Once we have the raw profiles the LLM comes back in. For each profile we extract the basics (handle, profile URL, follower count, privacy, bio), send it together with the original query to the model and ask for a short summary, a quality score from 1 to 5 and a relevance percentage. Anything below 70 percent relevance is dropped; everything above is saved with its summary and score and linked to the task. The platform is different, but the pattern stays the same: structured context in the front, Bright Data in the middle and LLM scoring on the way out.
Example of a real request and an answer:
Tasks, metrics and a bit of discipline
All of this runs as a long-running background job attached to a single task id.
The task goes through a simple lifecycle:
pending → running → completed or failed
We store the task record and all TikTok profiles linked to it. When you fetch /tasks/:id, you see both the raw task status and the list of analyzed profiles. This turned out to be surprisingly helpful for debugging: if TikTok is empty but the task is completed, the problem is probably on the crawling or analysis side, not the queue.
Because we added observability last week, almost every step is also wrapped in metrics:
- how many TikTok search tasks are created, completed, or failed,
- how long they sit in the queue,
- how long Bright Data calls take and how often they error out,
- how many LLM calls we make and how expensive they are.
YouTube: the half-hour spinner of doom
TikTok was the success story this week. YouTube was the reminder that not everything is ready to be wired into Wykra, no matter how clean the architecture looks on paper.
We tried plugging in the YouTube dataset with a very gentle test:
{ "url": "https://www.youtube.com/results?search_query=sourdough+bread+new+york+", "country": "US", "transcription_language": ""}
In theory, this should behave a lot like TikTok: trigger crawl, wait, download JSON, move on with life. In practice, after ~30 minutes of spinning, the only thing we got back was:
{ "error": "Crawler error: Unexpected token '(', \"(function \"... is not valid JSON", "error_code": "crawl_error"}
So for now YouTube isn’t really plugged into Wykra at all: the dataset just spins, throws a crawler JSON error, and gives us nothing useful to store or analyze. We’ve opened a ticket with Bright Data and postponed YouTube until that’s sorted.
Threads: parameter present, logic absent
Threads got its own attempt too. The plan was simple: run a basic keyword-based discovery, something like:
{ "keyword": "technology"}
Instead of profiles, we got back:
{ "error": "Parse error: Cannot read properties of null (reading 'require')", "error_code": "parse_error"}
So the keyword parameter exists, the dataset exists, but the bit in the middle that’s supposed to connect them clearly doesn’t. For now we’re treating Threads the same way as YouTube: noted the issue and moved “proper Threads support” into the later bucket.
LinkedIn: same old story
LinkedIn has a similar limitation to Instagram: there is no nice keyword search for “find me people who talk about X from country Y”. You can load profiles and pages, but not in the way Wykra needs.
The conclusion for now is the same as with Instagram: if we want proper keyword-driven discovery, we’ll probably have to plug in a Perplexity/LLM-style search layer on top of LinkedIn as well, not just rely on the dataset.
That’s a problem for another week, but at least now it’s a clearly defined problem, not a vague feeling that “LinkedIn is weird”.
Week 6 was mostly about testing how far the existing pattern stretches across new platforms and where it breaks. TikTok more or less behaves, YouTube and Threads don’t, and LinkedIn clearly needs its own search layer on top of the dataset. For now that’s enough — better a couple of flows that work than five half-broken ones.
If you want to support the project, ⭐️ the repo and follow me on X, it really helps.
Repo: https://github.com/wykra-io/wykra-api
Twitter/X: https://x.com/ohthatdatagirl
Top comments (3)
This series does a great job showing that scaling isn’t about adding platforms fast, it’s about knowing why each one breaks differently.
I look forward to a new article every week — please keep going!
Metrics around LLM cost and Bright Data latency feel especially smart — those are exactly the things that sneak up on you later.