DEV Community: Sam Kennard

The Claude skill that knows your pipeline

Sam Kennard — Thu, 30 Apr 2026 11:47:04 +0000

Your CRM is the clean version of every deal. Your inbox is what actually happened. Six pre-built reasoning blocks that pull the second one into Claude via iGPT.

Your CRM says what someone remembered to log. The actual deal lives somewhere messier: the objection that never made it into Salesforce, the proposal revision sitting in Drive, the follow-up question buried in a forwarded thread, the competitor mention that appeared once three weeks before the deal stalled.

That's the sales context AI usually doesn't see.

The iGPT Sales Workflow Pack is a Claude plugin that pulls it into the conversation.

Two parts. The plugin is six pre-built reasoning blocks that know which questions to ask for each part of your sales week. iGPT is the API doing the work underneath, indexing your inbox and Drive so Claude can actually read your customer conversations. Without iGPT, the plugin is clever questions pointed at nothing. Without the plugin, iGPT is an API you'd have to learn before it becomes useful.

Together: sales reasoning that runs on your real data. About ten minutes of setup, then it's there every time you open Claude.

What this looks like in your actual week

Monday morning. You ask Claude what to focus on. It tells you which three deals went quiet last week, which two are sitting on an unanswered question from the prospect, and which one has an open commitment from your side that's now four days overdue. With the threads linked.

Before a call at 10am. You ask for a brief on the contact. Every commitment they made, every commitment you made, every objection they raised, the tone shift in the last reply, what's still open. You walk into the call with the version of yourself who actually re-read every email this morning.

Mid-week deal review. Your manager asks "what's stuck on the [account] deal?" Instead of scanning the thread for ten minutes, you ask Claude. It comes back with the specific objection that's gone unaddressed since the last call, the price hesitation buried in a quoted reply two weeks ago, and the security question their IT person asked that nobody answered.

Thursday afternoon. You ask which existing customers have shown expansion signals lately. Three accounts come back with the exact moments — a customer mentioning a problem your premium tier solves, a side comment about wanting to bring on more seats, a buying signal in a forwarded thread. None of these were in the CRM.

Friday before logging off. You run friction detection across every open deal in one query. Whatever needed your attention is sitting in front of you Monday morning, ranked.

The skills cover the rest of the week too: catching follow-ups you dropped, surfacing every time a competitor came up in your threads (with what they were compared on), finding past customer problems you already solved that map to what a new prospect is asking about.

You can run any of them on a single deal, a named contact, a named account, or your full book.

What you actually get

A 30-second answer to questions that used to take 30 minutes of digging.

Question	What used to happen	What happens now
"What did we agree on with the prospect last Tuesday?"	Re-read three threads to reconstruct it	Open items, owner, dates raised, sources cited
"Which deals went quiet this week?"	Scan your inbox guessing at what's stale	Stale threads, last-touch dates, last unanswered question
"Why is the prospect not responding?"	Stare at the thread, theorize	Friction signals, unresolved objections, the moment momentum dropped
"What did we promise customers and never deliver on?"	You forget about it until they remind you	Commitments living in old threads that never made it to the CRM

Run the friction detector across every open deal on Friday afternoon, and you have your Monday pipeline review on the way out the door.

Why this works (the iGPT part)

iGPT is the email context API doing the work underneath. It indexes your inbox and Google Drive in the background, reconstructs threads, links email attachments to their conversations, and returns structured intelligence with full source attribution in a single call. Per-user scoping. Read-only. The same infrastructure serious teams are using to build production sales agents right now.

The plugin is the sales-shaped layer on top. It knows what to ask iGPT for, in what shape, for each moment in the sales week.

Setup, about ten minutes total

Three things to do, once.

Before you start: this works on Claude Pro, Max, Team, or Enterprise plans. If you're on free, you'll need to upgrade.

1. Connect iGPT to your inbox

Go to igpt.ai/hub/playground and sign in with your work email. From the dashboard, click Add data source to connect Gmail or Outlook, and Google Drive if you use one. iGPT starts indexing in the background.

First index takes a few minutes for a normal inbox, longer if you've got years of mail in there. You don't need to wait — the rest of setup happens in parallel.

2. Add iGPT as a connector in Claude

Open Claude (the desktop app or claude.ai in your browser).

Click your profile picture in the bottom-left, then Settings → Connectors.
Scroll down, click Add custom connector.
Paste https://mcp.igpt.ai in the URL field. Save.
You'll see a new entry called igpt in your connector list. Click Connect on it.
Claude redirects you to authorize iGPT (same flow as authorizing Google or Slack). Approve, and you're back in Claude.

This is what lets Claude actually read your indexed inbox when the plugin asks for it.

3. Install the sales plugin

In Claude:

Go to Customize → Plugins.
Click Browse plugins, then switch to the Personal tab.
Click the + button to Add marketplace.
Paste https://github.com/igptai/skills in the URL field. Hit Sync.

You'll see all the iGPT plugins show up — sales, customer success, finance, marketing, and the rest. They're listed but not installed yet.

Find igpt sales and click Install.

You'll get a confirmation: "igpt sales is installed and ready to use."

That's it. The skill is available in every Claude chat you open, not just in one Project.

Now use it

Open a new chat. Ask in your own words:

"Prep me for my call with [contact] tomorrow."
"What follow-ups did I drop this week?"
"What's stuck on the [account] deal?"
"Find every time [prospect] mentioned a competitor in our threads."
"Which customers have shown expansion signals lately?"

Claude recognizes what you're asking for, runs the right reasoning block against your real inbox via iGPT, and returns the answer with source emails linked. First run, you'll see a confirmation that the iGPT connector is being used. After that, it's invisible.

Grab it

GitHub: github.com/igptai/skills (the marketplace URL you paste in step 3)

This is the install for any team where the deal context actually lives across email threads, SOWs in Drive, and forwarded chains. If your sales motion runs mostly on calls and Slack, with a CRM that's somehow always up to date, the plugin will be quieter for you.

For everyone else: reps get the upside without writing a line of code. If you build sales infrastructure for a living, the same API is right there to build your own plugins against.

Either way, your week just got 90 minutes shorter.

Context Engineering Has a Blind Spot

Sam Kennard — Wed, 18 Mar 2026 10:27:57 +0000

The biggest shift in agent design over the past year has been context engineering rather than improved models. Most of the published guidance focuses on codebases, documentation, and structured knowledge bases, and it's good guidance.

But there's a category of enterprise data that breaks every standard context engineering pattern, and almost nobody is writing about it: email.

Why email is different from everything else

When Google's ADK team writes about context engineering, they describe a pipeline: ingest data, compile a view, serve it to the model. When Anthropic describes it, they talk about curating tokens for maximum utility.

Both assume the source data has some structural integrity to work with, because a codebase has files, functions, and imports, a knowledge base has documents with authors and dates, and even Slack has channels and timestamps.

Email has none of that. A 20-reply business thread contains the same quoted text duplicated up to 20 times, every email client quoting differently (Gmail uses > prefixes, Outlook uses indentation, Apple Mail wraps in blockquote HTML).

Forwarded chains collapse three separate conversations into a single message body with no structural separator. Inline replies break every deduplication pattern because someone typed new content between quoted blocks.

And the most critical information, the PDF with the actual contract terms or the invoice that needs reconciling, is sitting in an attachment that most context pipelines never touch.

This is where a huge amount of enterprise context actually lives, not in the CRM fields or the wiki, but in the messy, unstructured communication data where business actually happens.

What breaks at enterprise scale

The reason this matters isn't that one agent can't parse one email thread. It's what happens when you try to run context engineering across an organization's entire communication history.

A finance team closing the books at month-end needs to reconcile invoices against purchase order approvals across hundreds of vendors. The invoices arrive as PDF attachments and the approvals live in email threads scattered across 15 people's inboxes, often buried in a reply that says "approved, go ahead" with no formal record in any system.

An agent running multi-hop search over this data makes one retrieval call, gets a fragment, reformulates, searches again, and by hop 5 it's burning 40,000 tokens on a single vendor reconciliation.

Multiply that by 300 vendors and you've spent more on token costs than the finance team's monthly payroll, with accuracy degrading on every query because each hop compounds the noise from the previous one.

A compliance team monitoring regulatory commitments has to scan 50,000 threads per month for obligations that were agreed to in email and never entered into a tracking system. The commitments aren't labeled, they're buried in sentences like "we can do that by Q3" from someone in a 30-reply thread where the first 20 messages were about something else entirely.

A multi-hop agent searching for "regulatory commitments" returns threads that mention regulations, not threads that contain actual commitments. The semantic gap between what the agent searches for and what the data looks like structurally is exactly where context engineering is supposed to help, and where standard approaches fail on email.

A sales organization running deal risk scoring across 200 active opportunities needs to detect signals that only exist in email patterns: the champion going quiet over two weeks, procurement entering a thread where they weren't before, reply latency increasing, tone shifting from collaborative to transactional.

None of this shows up in the CRM, which says the deal is "Stage 3, on track" while the email thread says the deal is dying. An agent that can't reason over the full communication history with participant attribution, temporal ordering, and cross-thread awareness will miss every one of these signals, and miss them confidently.

The architectural gap

Standard context engineering assumes you can compile a useful view of your data at query time. For email at enterprise scale, this doesn't hold because the preprocessing required to make email useful is too expensive and too complex to do per-query.

Thread reconstruction, quoted text deduplication, participant attribution, attachment extraction, temporal ordering across threads that reference each other: this work needs to happen once at index time, not repeatedly inside an agent loop.

When you do it at index time, the agent gets pre-assembled context in a single retrieval call where latency is predictable, cost is fixed, and the same query returns the same result every time, which is the only way downstream automation actually works.

When you try to do it at query time through multi-hop search, you get variable latency (10-60 seconds depending on thread complexity), variable cost (scales with how messy the data is, which means your hardest queries are your most expensive), and variable accuracy (each hop builds on the previous hop's interpretation, and the error compounds).

The agent is simultaneously trying to reconstruct the conversation, figure out who said what, determine what's current versus what's quoted history, and answer the actual question. That's four jobs where each one is hard enough on its own.

What index-time context engineering looks like

The work that makes email usable for agents comes down to a few things that need to happen once, not per-query: reconstruct threads, strip quoted text, attribute who said what, and actually read attachments.

Then index all of it with semantic and structural metadata, scoped per-user so one person's agent can't surface another person's data.

Most teams skip this and go straight to multi-hop search, which works in demos and breaks in production at exactly the scale where the business case justifies the investment.

We build this infrastructure at iGPT, where a developer sends one API call and gets back structured, reasoning-ready context with source citations, with no loops or retries or per-query preprocessing.

from igptai import IGPT

client = IGPT(api_key="...", user="user_123")
result = client.recall.ask(
    input="Reconcile Q1 invoices from Apex Logistics, flag PO mismatches",
    quality="cef-1-normal",
    output_format="json"
)
# Structured JSON: vendor, invoice amounts, PO deltas, source email citations

The industry is right to focus on context, but most implementations assume the data is already usable, and email isn't.

If your agent is reasoning over email without fixing that first, it's not failing because the model is weak, it's failing because the context never made sense in the first place.

Docs: docs.igpt.ai
SDK: pip install igptai

Why email breaks every RAG pipeline

Sam Kennard — Thu, 12 Feb 2026 13:45:14 +0000

If you've built RAG over email, you know the feeling: everything works on PDFs and wiki pages, and then you point the same pipeline at someone's inbox and the whole thing quietly falls apart. Not with errors, but with bad retrieval you keep trying to fix with better chunking and bigger context windows until you realize the problem was never the retrieval.

Email threads aren't documents. Every standard RAG approach treats them like they are.

The standard approach

Connect to Gmail API, pull messages, chunk, embed, retrieve top-k.

service = build("gmail", "v1", credentials=creds)
results = service.users().messages().list(userId="me", maxResults=50).execute()

raw_emails = []
for msg in results.get("messages", []):
    full = service.users().messages().get(
        userId="me", id=msg["id"], format="full"
    ).execute()
    raw_emails.append({
        "id": msg["id"],
        "threadId": full.get("threadId"),
        "body": get_body_text(full.get("payload", {})),
        "headers": {
            h["name"]: h["value"]
            for h in full.get("payload", {}).get("headers", [])
        }
    })

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = []
for email in raw_emails:
    for split in splitter.split_text(email["body"]):
        chunks.append({"text": split, "metadata": {"thread_id": email["threadId"]}})

vectorstore = Chroma.from_texts(
    [c["text"] for c in chunks],
    OpenAIEmbeddings(),
    metadatas=[c["metadata"] for c in chunks]
)

This works on static documents because each chunk is self-contained and relationships between chunks are semantic. Email has neither property.

6 ways email breaks this

1. Quoted text duplication

In a 12-message thread, the Gmail API returns every reply with the full quoted chain below it. The original message appears 12 times. When you embed this, the oldest messages and signature blocks dominate the embedding space because they're repeated in every chunk, and the model reads repetition as reinforcement. Your most recent, most relevant messages get buried.

The fix isn't regex because people reply inline, edit quotes, and forward with additions mid-quote.

2. Thread structure vanishes

Email threads are conversation trees, not linear sequences. Message 7 might reply to message 3, not message 6. When you embed, that structure disappears. Ask "who approved this" and retrieval surfaces someone saying "looks good" when they were actually being quoted by someone disagreeing with them.

3. CC vs. authorship confusion

Your model sees "David" in the CC line and "David's proposal" in the body and has no structural way to distinguish "David was informed" from "David authored this." Extraction pipelines end up confidently attributing work to people who never wrote a single reply because their names appeared in CC fields.

4. Forwarded thread forks

Someone forwards a thread to a new group. Now you have two conversations that share history but diverged, and Gmail treats them as separate threads with no link between them. Ask "what did the team decide" and retrieval pulls from either branch without knowing they're contradictory.

5. Signatures and boilerplate at scale

Across a real organization: 30+ signature formats, compliance disclaimers in multiple languages, confidentiality notices longer than the actual messages. A meaningful portion of your token budget goes to this noise while the model treats it as content worth reasoning over.

6. Cross-thread temporal reasoning

"Let's revisit this next quarter" in January. "The timeline we discussed" in March. Completely different words for the same thing. The connection is temporal, not semantic, so vector similarity can't find it.

Why the usual fixes don't work

All six failures happen upstream of the model. Better models reason more confidently over the same broken input. Bigger context windows stuff in more duplicated text you're paying for.

Better prompts ask the model to reconstruct thread structure, deduplicate quotes, resolve attribution, and track temporal references on every single query. You're pushing infrastructure problems into the prompt.

The fix: treat email as a graph, not a document

Email threads are conversational graphs. Each message is a node, replies create edges, participants have roles that change over time, and decisions create cross-thread edges. The pipeline needs six layers between raw email and your model:

┌─────────────────────────────────────────────────────┐
│                   YOUR APPLICATION                   │
├─────────────────────────────────────────────────────┤
│  Layer 6: Hybrid Retrieval                           │
│  semantic search + metadata filters + graph traversal│
├─────────────────────────────────────────────────────┤
│  Layer 5: Cross-Thread Linking                       │
│  participant overlap, topic refs, temporal proximity │
├─────────────────────────────────────────────────────┤
│  Layer 4: Structured Metadata Extraction             │
│  decisions, tasks, owners, deadlines, sentiment      │
├─────────────────────────────────────────────────────┤
│  Layer 3: Participant & Role Tracking                │
│  From vs To vs CC, role changes across thread        │
├─────────────────────────────────────────────────────┤
│  Layer 2: Content Deduplication                      │
│  quoted text removal, inline edit preservation       │
├─────────────────────────────────────────────────────┤
│  Layer 1: Thread Reconstruction                      │
│  In-Reply-To / References headers → conversation tree│
├─────────────────────────────────────────────────────┤
│                  RAW EMAIL (Gmail API / IMAP)         │
└─────────────────────────────────────────────────────┘

Layer 1 is where most people start and stop. Map In-Reply-To headers to build the conversation tree:

from collections import defaultdict

def build_thread_tree(messages):
    by_message_id = {}
    children = defaultdict(list)
    roots = []

    for msg in messages:
        msg_id = msg["headers"].get("Message-ID", "")
        reply_to = msg["headers"].get("In-Reply-To", "")
        by_message_id[msg_id] = msg

        if reply_to and reply_to in by_message_id:
            children[reply_to].append(msg_id)
        else:
            roots.append(msg_id)

    return roots, children, by_message_id

Layers 2-3 handle deduplication and participant roles. Both are straightforward in concept but brutal in practice because email clients format quotes differently, people edit them without marking changes, and the distinction between "David authored this" and "David was CC'd" needs to be structured data, not something the model infers from flattened text.

Layers 4-6 extract structured metadata (decisions, tasks, owners, deadlines), build cross-thread connections, and combine semantic search with metadata filtering and graph traversal so you can say "find messages from Sarah about Q2 budget where a decision was made" and have the retrieval handle filtering before semantic matching.

This is what we built iGPT to handle. All six layers, one API call. Docs here.

What the difference looks like

Standard RAG:

results = vectorstore.similarity_search("What are the open action items?", k=5)
# - 2 chunks dominated by signature blocks
# - 1 chunk from a quoted reply (wrong attribution)
# - 1 relevant chunk buried in noise
# - 1 chunk from an unrelated thread (similar keywords)

Through iGPT:

from igptai import IGPT

client = IGPT(api_key="your-api-key", user="user-123")
response = client.recall.ask(
    input="What are the open action items from this week?",
    quality="cef-1-normal"
)

Seven source documents referenced, structured data with owners, dates, and attribution. No signatures, no duplicated quotes, no misattributed CC recipients. The infrastructure handled it before the model saw anything.

Streaming shows the pipeline stages in real time:

for event in client.recall.ask(
    input="Who committed to what in the last 7 days?",
    stream=True,
    quality="cef-1-normal"
):
    if "delta" in event:
        print(event["delta"]["output"], end="", flush=True)

Sources referenced: 22
Here is a summary of commitments made in the last 7 days...

| Date       | Person       | Commitment                                     |
|------------|-------------|------------------------------------------------|
| 2026-02-09 | Jane Doe | Proposed new campaign, requested alignment sync |
| 2026-02-10 | John Doe  | Reviewing blog and one-pager, final versions    |

Works the same in Node.js:

import IGPT from "igptai" and the API is identical.

Try it

pip install igptai

from igptai import IGPT

client = IGPT(api_key="your-key", user="your-user-id")

auth = client.connectors.authorize(
    service="google",
    scope="email",
    redirect_uri="https://your-app.com/callback"
)

datasources = client.datasources.list()

response = client.recall.ask(
    input="What decisions were made this week and who owns next steps?"
)

Don't want to set up OAuth just to see it work? The playground lets you connect your inbox and run queries in about five minutes, no code required.
Links:

🔗 iGPT Website
📖 API Documentation
🐍 Python SDK (PyPI)
📦 Node.js SDK (npm)
🛝 Playground