Match MGT

Posted on Mar 15

How I Built a WhatsApp Chat Analyzer Powered by Claude AI

#webdev #ai #cloudflarechallenge #react

I'm building MatchMGT — a personal CRM for dating app users. Think of it as
Tinder meets HubSpot: you manage your matches, track conversations, set
reminders, and get AI-powered suggestions. One of its most popular features
is the WhatsApp chat analyzer. Here's how I built it.

The Problem

If you use dating apps seriously, you've been there: you matched with someone
weeks ago, had a good conversation, and now you want to reconnect — but you
don't remember what you talked about, what they liked, or what topics you left
hanging.

I wanted to solve that. Instead of scrolling through hundreds of messages
manually, I built a feature that lets you paste a WhatsApp export and get back
a structured analysis in seconds.

How WhatsApp Exports Work

WhatsApp lets you export any chat as a .txt file. The format looks like this:
_
12/3/2024, 10:45 AM - Sofia: Did you watch that series I recommended?
12/3/2024, 10:47 AM - You: Not yet! Which one was it again?_

Each line has a timestamp, sender name, and message content. My first task was
parsing this reliably, because the format varies slightly between Android and iOS,
and between Spanish and English system messages.

Parsing the Chat on the Backend

The Worker receives the raw text and runs it through a parser that:

Matches each line against a date/sender/content regex
Skips system messages ("Messages are end-to-end encrypted", multimedia omitted, etc.)
Handles multiline messages (continuation lines with no timestamp)
Returns a clean array of { date, sender, content } objects

function parseWhatsApp(rawText, monthsBack = 6, sinceDate = null) {
  const lines = rawText.split("\n");
  const messages = [];
  const cutoff = sinceDate || (() => {
    const d = new Date();
    d.setMonth(d.getMonth() - monthsBack);
    return d;
  })();

  const regex =
    /^(\d{1,2}\/\d{1,2}\/\d{2,4}),?\s+(\d{1,2}:\d{2}(?::\d{2})?(?:\s?[AP]M)?)\s+-\s+([^:]+):\s+(.+)$/i;

  for (const line of lines) {
    const match = line.match(regex);
    if (match) {
      const [, dateStr, timeStr, sender, content] = match;
      const date = new Date(`${dateStr} ${timeStr}`);
      if (isNaN(date.getTime())) continue;
      if (sinceDate ? date <= cutoff : date < cutoff) continue;
      if (content.includes("end-to-end") || content === "<Media omitted>") continue;
      messages.push({ date, sender: sender.trim(), content: content.trim() });
    } else if (messages.length > 0) {
      const trimmed = line.trim();
      if (trimmed) messages[messages.length - 1].content += "\n" + trimmed;
    }
  }

  return messages;
}

Sending It to Claude

Once parsed, I format the messages into a readable block and send them to
Claude Sonnet via the Anthropic API. The main constraint: token cost.

A long WhatsApp conversation can have thousands of messages. I cap it at
40,000 characters, and if the chat is longer, I take the first 50 messages
(context) + the most recent 450 (relevance):

function formatWhatsAppForAI(messages, maxChars = 40000) {
  let text = `Conversation from ${from} to ${to} (${messages.length} messages):\n\n`;

  let selected = messages;
  if (messages.length > 500) {
    selected = [...messages.slice(0, 50), ...messages.slice(-450)];
    text += `[Representative sample of ${selected.length} messages]\n\n`;
  }

  for (const msg of selected) {
    const line = `${msg.sender}: ${msg.content}\n`;
    if ((text + line).length > maxChars) break;
    text += line;
  }

  return text;
}

Then the actual Claude call from the Cloudflare Worker:

async function callClaude(apiKey, systemPrompt, userMessage) {
  const response = await fetch("https://api.anthropic.com/v1/messages", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": apiKey,
      "anthropic-version": "2023-06-01",
    },
    body: JSON.stringify({
      model: "claude-sonnet-4-20250514",
      max_tokens: 2000,
      system: systemPrompt,
      messages: [{ role: "user", content: userMessage }],
    }),
  });

  const data = await response.json();
  return data.content?.[0]?.text || "";
}

The system prompt instructs Claude to respond only in valid JSON with this structure:

{
  "interests": ["hiking", "indie films", "cooking"],
  "personality_traits": ["curious", "introverted", "witty"],
  "date_ideas": ["visit a farmers market", "cooking class together"],
  "conversation_topics": ["her trip to Patagonia", "that book she mentioned"],
  "gift_ideas": ["a cookbook", "a plant"],
  "summary": "Sofia is curious and thoughtful. Conversations flow naturally around culture and food."
}

No markdown. No extra text. Just the JSON. This makes frontend parsing trivial.

What I Learned

1. Always validate the JSON response.
Claude is very consistent, but edge cases happen (empty chats, very short
conversations). Wrap the parse in try/catch and return a graceful error.

2. The 40k char limit is a cost decision, not a technical one.
Claude can handle more. I chose this number to keep per-analysis cost under $0.05.
Next step: Anthropic's prompt caching to reduce cost on repeated analysis of the
same contact.

3. Date parsing is the hardest part.
Not Claude. Not the API. The regex. WhatsApp format differs between phone
languages, OS versions, and export settings. I had to handle at least 4 variations
before it felt solid.

4. Run AI on the backend, not the browser.
Keeping the Anthropic API key in a Cloudflare Worker means it's never exposed
to the client. The frontend just sends the raw text and receives structured JSON.

The Result

Users paste their WhatsApp export and get back a structured profile of the person
they're talking to — interests, personality traits, conversation starters, date ideas,
even gift suggestions. All in under 5 seconds.

It's one of those features that sounds gimmicky until you actually use it and realize
you've been doing this manually in your head for years.

If you're building something similar or just want to see the full product, check out
MatchMGT — it's free to start, no credit card needed.

Happy to answer questions about the Cloudflare Workers setup, the parsing logic,
or the prompt engineering in the comments. 👇

DEV Community