DEV Community: Tsvetan Gerginov

Pin your MCP server contracts the way you pin your dependencies

Tsvetan Gerginov — Tue, 21 Jul 2026 23:43:41 +0000

You pin your npm dependencies. You have a lockfile. You review the diff when it changes.

Now consider the MCP servers your agent depends on. What pins those?

tools/list hands back names, descriptions, and JSON schemas, and your agent trusts all of it. The description isn't documentation — it's the instruction the model reads to decide what a tool does and when to call it. There's no version pin, no integrity check, no diff to review. The server changes, and your agent's behaviour changes with it.

Four ways an MCP dependency breaks you quietly

1. A description is rewritten. Same tool name, same schema, different text. The model now behaves differently and nothing registers a change. This is the one that matters most, because it requires no schema change at all — which is exactly why schema-only diffing misses it.

2. A required parameter appears. Your existing calls start failing with invalid params, and you find out from production traffic.

3. readOnlyHint flips from true to false. A tool you allow-listed as safe to call freely can now mutate state.

4. A tool disappears. Best case, a clean error. Worst case, your agent improvises with something else.

Prior art, up front

Invariant Labs did the foundational work here. They named tool poisoning and MCP rug pulls, and their mcp-scan has detected description changes via tool hashing since April 2025. If you want to audit the MCP servers installed on your own machine, that's the tool — it scans Claude, Cursor, and Windsurf configs, detects cross-origin escalation (tool shadowing), and offers a proxy mode with live guardrails. None of which I do.

I needed something adjacent: a CI gate. Not "is my laptop safe," but "did this dependency's contract change since my last release." And it had to run against internal servers without sending tool descriptions to anyone's API.

So I built mcpward.

Two commands

npx mcpward baseline   # snapshot the server's contract into a lockfile
npx mcpward diff       # in CI: fail the build if it drifted

The baseline captures every tool's name, description hash, input and output schemas, and annotations. Then, when the server ships an update:

DRIFT (5 failed)
  ✗ Tool "echo" description changed (possible rug-pull)
  ✗ Tool "compute" inputSchema added required property "multiplier"
  ✗ Tool "read_data" readOnlyHint changed from true to false (tool may now mutate state)
  ✗ Tool "removed_tool" was removed

Summary: 2 passed | 5 failed

Exit code 1. The build fails. Four contract changes, none of which would have surfaced at runtime until something broke.

Breaking vs non-breaking

Not every change should fail a build. Adding an optional parameter is fine. Adding a required one is not. The classification:

Change	Class	Fails by default
Tool removed	breaking	yes
Description changed	rug-pull	yes
Required field added, type narrowed, enum tightened	breaking	yes
Optional field added, type widened, constraint loosened	non-breaking	no
`readOnlyHint` true→false, `destructiveHint` false→true	annotation change	yes
Tool added	info	no

Getting that line right is the hard part of the whole tool. It's a pure function with an exhaustive fixture-backed test suite behind it, and it's the part I'd most like to be told I've got wrong.

What else it checks

Since it's already speaking the protocol as a real client:

Protocol compliance — handshake, version negotiation, capability consistency, JSON-RPC correctness.

The two-layer error contract. This one is underrated. MCP distinguishes protocol errors (a JSON-RPC error object) from tool errors (a successful result carrying isError: true). A tool that fails its job — file not found, upstream 500 — should return the second, not the first. Servers get this backwards routinely, and it changes how every client must handle the failure. Nothing else checks it.

Tool-poisoning heuristics — injection-like phrasing in descriptions, hidden and zero-width unicode, schemas soliciting API keys or passwords, readOnlyHint that contradicts an obviously destructive tool. Output is SARIF, so findings land in the GitHub Security tab.

Behavioral suites — declarative YAML cases with JSONPath assertions:

suites:
  - tool: read_file
    cases:
      - name: missing file is a tool error, not a protocol error
        args: { path: "/does-not-exist" }
        expect: { tool_is_error: true }
      - name: invalid params are a protocol error
        args: {}
        expect: { protocol_error_code: -32602 }

Latency budgets — p50/p95 per tool against a configurable threshold.

On trusting a testing tool

A test harness nobody can trust is worse than none, so the testing approach is deliberately paranoid.

Every check is developed against controlled fixture servers whose correctness is defined up front: a fully compliant one, a deliberately malformed one, a pair differing by exactly one change of each classification class, a slow one, and a poisoned one. Real third-party servers can't serve as ground truth, because you don't control whether they're correct.

And every check has a negative test proving it can go red. A check that only ever passes is a false-confidence generator, not a feature. The clean fixture also has to stay 100% clean through every release — a security check that cries wolf trains people to ignore it, which is worse than not shipping it.

Try it

npx mcpward init
npx mcpward run

Black-box, so it works against servers you didn't write, over stdio or Streamable HTTP, in any implementation language. MIT licensed. Runs entirely on your machine: no account, no API calls, no telemetry.

https://github.com/TsvetanG2/mcpward

If you've been bitten by description drift in practice — or if you think I've drawn the breaking/non-breaking line in the wrong place — I'd like to hear it.

Building a Resilient Instagram Scraper With Selenium — What Mimicking Human Behavior Actually Looks Like

Tsvetan Gerginov — Fri, 12 Jun 2026 20:54:27 +0000

Up front: this is a personal/research tool for downloading from public Instagram profiles. Use it responsibly and within Instagram's Terms of Service and your local laws. This post is about the engineering — specifically, what it takes to make browser automation behave less like a bot — not about evading anyone.

Scraping any modern social platform is less a parsing problem and more a behavioral one. The HTML is the easy part. The hard part is that the site is actively watching how you act, and the moment you act like a script — instant scroll to the bottom, requests at machine speed, no pauses — you hit a challenge page and you're done.

I built InstagramWrapperPostScraper as a Python + Selenium tool that drives a real Microsoft Edge browser to download photos, videos, and captions from public profiles. The interesting engineering isn't "how do I find the image URL" — it's "how do I make a browser automation script move through a page the way a person would." MIT licensed, Python 3.10+.

Why a real browser instead of an API or HTTP

There are three broad ways to pull data off Instagram, and they fail differently:

API approaches run into rate limits fast and require credentials/tokens that get throttled
Plain HTTP scraping is brittle and trivially detectable — no JS execution, obvious request patterns
Driving a real browser (this approach) executes the actual page JS, renders like a human's session, and can keep working through temporary rate-limit blocks

The tradeoff: a real browser is slower and heavier. But for a personal-scale download tool, reliability beats speed.

The actually-interesting part: human-like behavior

The most recent version (0.0.2) is almost entirely about making the scroll behavior look human, and this is the part I'd point any automation person to. A naive scraper does scrollTo(bottom) and fires requests as fast as the network allows. This one deliberately doesn't:

Randomized scroll steps — it scrolls 50–90% of the viewport at a time, not straight to the bottom
Occasional scroll-ups — sometimes it scrolls back up, the way a human re-reads something
Random pauses — 2–5 seconds between actions instead of hammering
Longer initial waits — 4–7 seconds when first opening a profile (bumped up from 3–5s)
Periodic challenge checks — every 10 scrolls it checks whether a rate-limit/challenge page has appeared

That last point connects to the other 0.0.2 improvement: a dedicated _is_challenge_page() method that recognizes captcha/challenge pages by checking the URL plus DOM selectors, rather than naively grepping the page source. Source-string matching gives false positives the moment Instagram tweaks copy; checking structure is more robust.

There's also better end-of-profile detection — it retries scroll up/down 5 times before concluding it's actually reached the bottom, instead of giving up after one attempt — and a carousel retry path that handles duplicate slide URLs and skips blocked slides.

Clean output structure

One thing I cared about: the downloads should be usable, not a flat dump of files. Each post gets its own folder, carousels keep their slide order, and every post's caption is saved alongside the media:

downloads/
└── username/
    ├── images/
    │   ├── post_1/
    │   │   ├── username_1.jpg
    │   │   └── description.txt
    │   └── post_2/
    │       ├── username_2_01.jpg   ← carousel slide 1
    │       ├── username_2_02.jpg   ← carousel slide 2
    │       └── description.txt
    └── videos/
        └── post_3/
            ├── username_3.mp4
            └── description.txt

Honest limitations

I'd rather you know the walls before you hit them. Straight from the README:

Public profiles only — private profiles need the scraper account to follow them
Edge only — no Chrome or Firefox support; it relies on Edge WebDriver
Instagram UI changes break selectors — when that happens, update Selenium/Edge and retry. This is the permanent tax on scraping anything you don't control
Rate limits still apply — on very large profiles (1000+ posts) expect pauses and retries; the human-like behavior reduces blocks, it doesn't make you invincible
No proxy support — every request comes from your real IP

That last two are deliberately on the label. This isn't a tool that pretends to be undetectable, and I'd be suspicious of any that did.

The takeaway worth stealing

Even if you never touch Instagram, the general lesson ports to any Selenium/Playwright automation: the gap between "works once on my machine" and "works repeatedly" is almost entirely about timing and behavioral realism. Randomized waits, partial scrolls, structural (not string-based) state detection, and retry-with-backoff are the difference between a script that runs and a script that keeps running.

Links:

🔗 Repo: github.com/TsvetanG2/InstagramWrapperPostScraper

If you've built browser automation that has to survive a hostile, frequently-changing site, I'd like to hear which behavioral tricks actually moved the needle for you.

I Built an MCP Server With 132 Tools So Claude Can Manage Cognigy.AI Agents for Me

Tsvetan Gerginov — Fri, 12 Jun 2026 20:48:31 +0000

I've spent some quite of time building conversational AI agents on Cognigy.AI — enterprise voice bots, multilingual flows, NLU training, the works while working at Deloitte. It's a powerful platform. It's also a lot of clicking. Create flow, open node editor, configure node, train intents, create snapshot, promote to next environment... and now we live in a world where my coding assistant can write entire applications, but couldn't touch any of that.

So I fixed it. cognigy-ai-mcp-management-server is a local MCP (Model Context Protocol) server that gives AI assistants like Claude, Claude Code, and Cursor programmatic access to the Cognigy.AI Management API. 132 tools, TypeScript, MIT licensed, on npm.

Instead of clicking through the UI, you can now tell your assistant things like:

"Create a new intent for order cancellation with these example sentences, then retrain the NLU"
"Run the regression test playbook and summarize what broke"
"Diff the current snapshot against production and tell me what changed"
"Find every flow that uses this deprecated connection"

What's in the box

The 132 tools span essentially the whole management surface: flows and nodes (full CRUD plus search and AI output generation), intents and NLU training, playbooks and regression testing, snapshots and packages (create, diff, promote across environments), Knowledge AI / RAG stores (21 tools just for that), LLM provider configuration, connections, extensions, contact profiles with GDPR export, analytics, and audit logs. The full list lives in TOOLS.md.

Design decisions I'd defend in a code review

Building an MCP server that can mutate production conversational AI agents forces you to think about safety differently than a read-only integration. A few choices I made:

1. dryRun: true by default on every mutating tool. An LLM with write access to your production agents is a chainsaw. Every tool that creates, updates, or deletes anything defaults to a dry run — the assistant sees exactly what would happen and has to explicitly flip the flag to execute. The destructive path requires intent, not just a hallucinated tool call.

2. Secrets never reach the model. API keys live only in environment variables and memory; connection secrets coming back from the Cognigy API are automatically redacted before the LLM sees them. Your model context should never contain credentials — that's non-negotiable.

3. Async-aware tooling. NLU training and snapshot operations are long-running tasks on Cognigy's side. The tools poll task status until completion instead of returning a job ID and leaving the LLM to guess, which keeps multi-step agent workflows from silently desyncing.

4. Zod validation on every input. LLMs produce mostly correct tool arguments. "Mostly" is doing heavy lifting in that sentence. Every tool validates its inputs with Zod schemas before anything hits the API.

5. Peer dependency instead of bundling. Cognigy's official REST client is published under a proprietary license, so this server declares it as a peer dependency rather than bundling it. You install it yourself and accept Cognigy's terms directly; my MIT license covers only my code. Licensing hygiene is boring until it isn't.

Mock-first development

You don't need a Cognigy account to hack on this. The repo ships with a Prism mock server generated from the OpenAPI spec:

# Terminal 1
npm run mock

# Terminal 2
npm test   # 49 tests against the mock

TypeScript types are also generated from the OpenAPI spec (npm run gen:types), so when Cognigy updates their API, regenerating types surfaces breakage at compile time instead of at runtime in someone's production tenant.

Getting started

npm install @cognigy/rest-api-client   # official Cognigy SDK, their license
npm install -g cognigy-ai-mcp-management-server

Then point your MCP client at it — for Claude Code, drop this in .mcp.json:

{
  "mcpServers": {
    "cognigy": {
      "command": "npx",
      "args": ["cognigy-ai-mcp-management-server"],
      "env": {
        "COGNIGY_BASE_URL": "https://api-trial.cognigy.ai",
        "COGNIGY_API_KEY": "your-api-key-here"
      }
    }
  }
}

Works with a free Cognigy trial account, SaaS, or on-prem. Configs for Claude Desktop and Cursor are in the README.

The honest footnote

Mid-project, I discovered that NiCE (Cognigy's parent company) had shipped an official MCP server. Did I consider abandoning mine? For about an hour. Then I kept going, because (a) I was learning an enormous amount about the Management API surface by mapping all of it, and (b) an independent, MIT-licensed, mock-testable implementation with dryRun-by-default semantics is a different artifact than an official one. If the official server fits your needs better — use it! This one exists, it's open, and the code shows exactly how to wrap a large enterprise API into LLM-safe tooling. That alone made it worth shipping.

(Standard disclaimer: this is an independent project, not affiliated with or endorsed by Cognigy or NiCE. You need your own Cognigy account and API key.)

Try it

If you build on Cognigy.AI — as a developer, solution architect, or SI partner — I'd genuinely love to hear what workflows you'd automate first. And if you're building MCP servers for other enterprise platforms, the dryRun + redaction + async-polling patterns here are portable; steal them.

I Got Tired of Downloading Email Attachments One by One, So I Built a Desktop App for It

Tsvetan Gerginov — Fri, 12 Jun 2026 20:42:41 +0000

Picture this: 200 invoices in your inbox, scattered across six months of emails, and accounting needs all of them in a folder. Today. Gmail's UI gives you exactly one way to do this — open email, click attachment, download, repeat. Two hundred times.

I refused. So I built Email Attachment Downloader — an open-source desktop app that bulk-downloads attachments from Gmail and Outlook with filtering, auto-renaming, and a modern dark GUI. Python, MIT licensed, runs on Windows, macOS, and Linux.

What it does

The workflow is simple: connect to your inbox over IMAP, filter, preview, download. But the details are where it earns its keep:

Filter by sender, subject, and date range — invoices@company.com + "invoice" + Jan–Mar gets you exactly the emails you need, with a built-in calendar picker for dates
File type selection — download only PDFs, or only spreadsheets, or images, documents, presentations, archives — your call
Preview before download — see every matching email and its attachments, deselect the noise, then pull the trigger
Auto-renaming patterns — this is my favorite part. invoice.pdf becomes 2024-01-15_invoice.pdf or john_2024-01-15_invoice.pdf automatically. Anyone who's ended up with invoice.pdf, invoice(1).pdf, invoice(14).pdf knows the pain this solves
Threaded downloads — attachments download in parallel without freezing the UI, with a real-time progress bar and activity log

The stack

Python 3.10+ with plain IMAP (imaplib) for email access — no Gmail API credentials, no OAuth app registration, no Google Cloud Console. An App Password and you're in
CustomTkinter for the GUI — if you think Tkinter apps have to look like Windows 95, CustomTkinter will change your mind. Modern dark theme, clean widgets, zero web stack
tkcalendar for the date picker

The architecture is deliberately modular — email_client.py handles IMAP, downloader.py handles extraction and saving, renamer.py is pure renaming logic.

Security — because it's your inbox

An app that asks for your email password deserves scrutiny, so here's the deal:

Your password is never stored — it lives in memory for the session only
It works with App Passwords (recommended), so your real password never touches the app
All connections use SSL/TLS (IMAP over port 993)
The app is read-only — it never modifies or deletes emails
And it's open source, so you don't have to take my word for any of this — read the code

The README has step-by-step App Password setup guides for both Gmail and Outlook, because that part trips up everyone the first time.

Honest limitations

It searches your INBOX folder over IMAP — if your invoices live in a label/subfolder, or your provider doesn't do IMAP, you're out of scope for now. And it's a desktop tool by design: no cloud, no web version, your credentials and files stay on your machine. I consider that last one a feature, but your mileage may vary.

There's a Windows installer if you just want to use it, or clone and pip install -r requirements.txt if you want to poke at the code. Issues and PRs welcome.

Links:

Repo: github.com/TsvetanG2/Email-Attachment-Downloader
Installer: Releases page

What's the most attachments you've ever had to download by hand? I'll start: enough to build this app.

I've built a open source PDF-To-Excel-Converter

Tsvetan Gerginov — Fri, 12 Jun 2026 20:31:49 +0000

Github Repository

Hi community,

I've built a open source PDF to Excel Converter and let me tell you why!

We've all been there: someone sends you a 40-page PDF report and asks for "the numbers in a spreadsheet by Friday." You can copy-paste cell by cell, pay for a SaaS converter that uploads your (possibly confidential) data to who-knows-where, or... build your own tool.

What it does

The core idea is simple: upload a PDF, pick an extraction mode, download an .xlsx. The interesting part is in the modes, because "convert a PDF" means different things depending on the document:

1. All Text + Tables — extracts everything (paragraphs, headings, tables) and consolidates it into a single worksheet. Useful when you need the full content of a document in a structured, searchable format.

2. Tables Only — ignores the prose and hunts down tabular data specifically. Each detected table lands in its own sheet in the workbook. This is the mode you want for financial reports, invoices, or anything where the tables are the data.

That second mode is where most converters fall short — they either flatten tables into mush or miss them entirely. Splitting each table into a separate sheet keeps the structure intact and makes downstream work (pivot tables, formulas, imports) actually possible.

The stack

Nothing exotic, and that's deliberate:

Python + Flask for the web app — file upload, mode selection, conversion, download. One form, one job.
pdfplumber for text and layout-aware extraction
tabula-py for table detection and extraction
A separate desktop version in the repo for people who don't want to run a server at all

Why two extraction libraries? Because PDFs are chaos. A PDF is fundamentally a visual format — it knows where to draw characters, not what a "table" is. pdfplumber is excellent at layout-aware text extraction, while tabula's table detection handles structured grids better. Using each for what it does best gives much more reliable output than forcing one library to do everything.

Why local-first matters

Most "free PDF converter" sites are upload services. That's fine for a recipe PDF — less fine for contracts, bank statements, or client data. This tool processes everything locally:

git clone https://github.com/TsvetanG2/PDF-To-Excel-Converter.git
cd pdf-to-excel-converter
pip install -r requirements.txt
python pdftoexcel.py

Then open http://localhost:5000, upload, convert, done. Your files never leave your machine.

Honest limitations

I'm not going to pretend this beats commercial tools on every PDF. Scanned documents (images of text) need OCR, which isn't in scope here — this works on PDFs with an actual text layer. And table detection on documents with creative, merged-cell layouts is a hard problem for every tool in this space, including this one. For typical reports, exports, and structured documents, it does the job well.