DEV Community

Cover image for Add video transcoding to your Claude agent in 5 minutes (MCP)
Mark Turner
Mark Turner

Posted on

Add video transcoding to your Claude agent in 5 minutes (MCP)

Teach your Claude Agent to process Zoom recordings and extract audio in 5 minutes (MCP)

As IT developers, we are constantly tasked with building internal tools to automate messy, repetitive workflows. With the rise of AI agents, it’s now incredibly easy to build a Claude-powered bot that manages tickets, audits logs, or summarizes text.

But things fall apart the moment a user drops a massive 2GB raw Zoom recording, a Microsoft Teams .webm export, or a screen-share video into the chat and asks the agent to "compress this for the wiki" or "extract the audio so we can transcribe it."

Suddenly, your lightweight AI agent needs to be a media engineering wizard. Your options? Either force a local installation of FFmpeg (and deal with cross-platform binary dependencies breaking in production) or spend days configuring AWS MediaConvert pipelines, S3 buckets, IAM roles, and webhooks.

Spoiler alert: You shouldn't have to build cloud infrastructure just to downsample a corporate meeting recording.

Thanks to Anthropic’s Model Context Protocol (MCP) and a developer-friendly platform called Botverse, you can give your Claude Agent full video-transcoding and audio-extraction superpowers in exactly 5 minutes—without writing a single line of infrastructure code.


🛠️ The 5-Minute Setup

To give your local Claude Desktop agent video-processing capabilities, you just need to connect the Botverse remote MCP server to your client.

  1. Sign up at botverse.cloud and copy your API token from the dashboard.
  2. Open your Claude Desktop configuration file:
  3. macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  4. Windows: %APPDATA%\Claude\claude_desktop_config.json

  5. Add the botverse configuration block under the mcpServers object:

{
  "mcpServers": {
    "botverse": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://botverse.cloud/mcp?token=YOUR_BOTVERSE_TOKEN"
      ],
      "env": {}
    }
  }
}

Enter fullscreen mode Exit fullscreen mode

Replace YOUR_BOTVERSE_TOKEN with your actual token, save the file, and restart Claude Desktop. That’s it. Claude now inherently understands how to manipulate video and audio files.


🔄 The IT Developer Workflow Under the Hood

Once connected, Claude automatically discovers the new media tools. When you ask Claude to handle a video file, it autonomously orchestrates a clean, 3-step asynchronous workflow:

1. transcode_from_url

Claude kicks off the process by sending the raw video URL (like a direct link to a cloud-stored meeting recording) straight to Botverse. You don't have to upload massive files into your LLM prompt context.

  • For video compression: You can tell Claude to convert a massive raw file to a web-friendly 720p MP4.
  • For data/text extraction: You can instruct Claude to strip the video entirely and extract just the MP3 or WAV audio—perfect for feeding into a transcription API like Whisper to generate meeting notes.

2. get_job_status

Media processing takes time. Instead of blocking the LLM or hitting a network timeout, Claude will intelligently poll this tool in the background to check on the job's progress while it cooks.

3. get_download_url

As soon as the job status marks itself complete, Claude calls this final tool to retrieve a secure, signed download URL for the newly generated asset.


📸 See It In Action

Imagine an internal Slack or desktop bot where a developer or project manager needs to extract audio from a town hall meeting. You can type a natural language command:

"Extract the audio from this raw recorded meeting link as an MP3 so I can run a transcript on it: https://storage.company.internal/meeting_10823.webm"

Claude handles the tool coordination automatically:

[Claude Desktop UI]
🤖 Calling tool: botverse.transcode_from_url... 
   ↳ Parameters: { url: "...", outputs: [{ format: "mp3" }] }
   ↳ Status: Job created (ID: job_dev_7812)

🤖 Calling tool: botverse.get_job_status (job_dev_7812)... 
   ↳ Status: Processing (Audio extraction in progress...)

🤖 Calling tool: botverse.get_job_status (job_dev_7812)... 
   ↳ Status: Completed

🤖 Calling tool: botverse.get_download_url (job_dev_7812)...
   ↳ Signed URL retrieved!

"I have successfully extracted the audio from your meeting recording. You can download the MP3 file here to pass to your transcription pipeline: [Download Meeting Audio](https://botverse.cloud/d/xyz123...)"

Enter fullscreen mode Exit fullscreen mode

💰 Predictable Pricing, Zero Idle Server Costs

We all hate surprise cloud bills from idle infrastructure. Botverse uses a transparent, pay-as-you-go model that keeps costs entirely predictable:

  • $0.25 per job (for standard source video files under 5 minutes).
  • +$0.08 per minute for overage on longer files (like 30-minute standups or hour-long webinars).
  • $2.50 minimum top-up to fund your developer wallet and get started.

There are no fixed monthly subscriptions, no base fees, and your credits never expire. You only pay when your agent is actively processing media.


🚀 Next Steps

Stop wasting time writing boilerplate infrastructure code, debugging FFmpeg layers in Docker containers, or over-engineering cloud pipelines for simple internal tools. Let MCP do the heavy lifting.

  • 🌐 Get Started: Head over to botverse.cloud to grab your API token.
  • 📚 Read the Docs: Check out the Botverse Documentation for more advanced parameters, document conversions, and agent automation blueprints.

Top comments (0)