DEV Community

Abhinandan Sharma
Abhinandan Sharma

Posted on • Edited on

Cursor for PPT | MCP Exploration

I was chilling out with a friend(who is in big 4), he complaining about the work life balance and booking tickets to Goa for onsite trip(after visiting Thailand last month), the usual. We kind of got into discussions on AI and the general belief that it will eat dev jobs, like that will scare me(I make excellent tea, so my backup plan is always there) where he discussed a crucial problem many consultants, bankers and all suit people face: PPT automation.

Companies have internal tools connected with internal data sources which help in automating much of the task in hand but there is more to it. Small consulting firms, compliance firms, analytics firms need powerpoint presentations automated with a fixed suite of components and consistent design. And that's where it hit me.

I did some market research on perplexity and found out that there exists:

  1. Beautiful.ai who are focussed on automating with their own set of templates.
  2. Slidebean: Who helps founders make their flashy decks to loot on investors.

But I couldn't find a proper enterprise level product who can intake the styles proposed by the tenant(client) and data sources fixed by them and produce the decks accordingly.
Generating everything in one go is kind of risky as AI can hallucinate. If prompted to create from scratch, we are bound to get inconsistent designs, wrong detailing, no org level styling and most of all: loss of control.

As a software developer with no experience in building businesses, I started implementing.(Obviously didn't reach out any clients if there is a need, or what's the need because I am a cutie devootie)

Anyways, I started reading about AI agentic things and once famous, buzzing keyword I stumbled on pretty quickly was: MCP.

Now, MCP is Model Context Protocol and here is an awesome course explaining everything about it: Hugging Face MCP Course

Some key concepts, just for a brush up:

  1. Host, Client and Server: The Host is the user-facing AI application that end-users interact with directly. The Client is a component within the Host application that manages communication with a specific MCP Server. The Server is an external program or service that exposes capabilities to AI models via the MCP protocol. A single Host can connect to multiple Servers simultaneously via different Clients. Each Client maintains a 1:1 connection with a single Server. New Servers can be added to the ecosystem without requiring changes to existing Hosts. Capabilities can be easily composed across different Servers.
  2. MCP works on JSON-RPC: JSON-RPC is a lightweight remote procedure call protocol encoded in JSON. A Request message includes a unique identifier (id), the method name to invoke (e.g., tools/call) and parameters for the method (if any).
  3. Transport Mechanisms: JSON-RPC defines the message format, but MCP also specifies how these messages are transported between Clients and Servers. . There are two: stdio and HTTP + SSE/Streamable HTTP. The stdio transport is used for local communication where the Client and Server run on the same machine. The HTTP+SSE transport is used for remote communication. Communication happens over HTTP, with the Server using Server-Sent Events (SSE) to push updates to the Client over a persistent connection.
  4. Tools are executable functions or actions that the AI model can invoke through the MCP protocol.
  5. Resources provide read-only access to data sources, allowing the AI model to retrieve context without executing complex logic.
  6. Prompts are predefined templates or workflows that guide the interaction between the user, the AI model, and the Server’s capabilities.
  7. Sampling allows Servers to request the Client (specifically, the Host application) to perform LLM interactions.

Now we have knowledge of the MCP servers, time to layout a plan.
This is what I came up at first(and implemented it!).

The first problem is the lost control and inconsistent styling when we ask AI to generate the whole thing. So, I came up with the concept of components. Like in React, we have a css framework which can provide existing components which not only ease the dev work but make the whole styles consistent and abstract us from the intricacies like padding a specific element. So, why not have components in PPT as well.

For the first MVP, I set these requirements:

  1. There is a Component list which AI can generate from. Maybe some small components like textbox, bullet points etc.
  2. There should be atleast two themes which will define the overall color scheme of the document.
  3. The PPT generation part is handled by our server.
  4. Saving the ppt in local.

Claude happens to have excellent MCP server connectivity. So, I started with this simple architecture:

Claude here will connect to my MCP server. The MCP server would contain different tools for each component which will connect to Fast API server through a REST protocol. The server itself will do the changes in the PPT. Keep in mind that we are just building it for local right now and this is the first iteration.

The server will look like:

We have a single endpoint(/slide/{id}/component) for adding each component in a particular, with simple request body mapping to the component class. Pretty straightforward.
We have a different endpoint for saving and setting a theme. Since we right now are thinking of having it in local, I haven't added a ppt id. We will improve on the current code only to make it better and better overtime.

This is how it worked:

Github URL for the first MVP version:

Now we have built a foundation, let's build on this and see the requirements now.

Final Idea

Cursor-like software for Powerpoint. An add in which acts as a chat interface. Then for each change, accept/reject changes component-wise and having the previous changes in memory. Accessing any other file as reference, like "From @analysis.excel get the data and convert it into chart data and add it as a pie chart."

This is a strong vision but is feasible. So, let's start with the requirements:

  1. Powerpoint add in for chatbot interface
  2. Accepting/rejecting changes done component wise
  3. Accessing any other files as reference
  4. Generation of charts possible
  5. Expanding and beautifying the component library
  6. Tenant decide the component library. For each component, there may exist a suite of styles available to pick from.

The next blogs would be in phases and by each blog we will get a step closer to the final goal.


Connect with me

I post about AI tools, coding challenges, and product experiments.

🧵 Follow me on Twitter: @abhinandan824

💼 Let’s connect on LinkedIn: https://www.linkedin.com/in/abhinandan-sharma-dtu/

Top comments (0)