DEV Community

Cover image for What if the TV App of the Future Isn't an App but It's a Conversation?
Giovanni Laquidara
Giovanni Laquidara

Posted on

What if the TV App of the Future Isn't an App but It's a Conversation?

Think about how you pick something to watch tonight. You open Netflix. You scroll. You read a synopsis. You scroll more. You check Hulu. More scrolling. You open a "what to watch" thread on Reddit. Twenty minutes later, you're still scrolling.

Now imagine this: you tell an AI agent "I'm in the mood for something like Severance but funnier," and a full streaming interface appears inline: poster cards, hero banners, horizontal rows, all inside the chat. You click a card, a detail overlay slides in with ratings and a description. You say "actually, something shorter, maybe a movie." The interface reshuffles instantly. You say "Great! Let's watch it!" A video player starts.

The Streaming UIs

Every major streaming platform ships essentially the same product. A horizontal-scroll grid organized by algorithmic rows. "Because You Watched X." "Trending Now." "Continue Watching." The UI paradigm hasn't changed in a decade.

The problem isn't the catalog or the recommendations. It's the input mechanism. A grid forces you to browse. You scan thumbnails, read titles, maybe click one to read a synopsis, back out, keep scanning. The cognitive load is on the user. You have to translate a vague feeling ("something light but not stupid") into a sequence of clicks and filters that the app was never designed to understand.

Natural language flips this. "Something light but not stupid, maybe a comedy-drama, nothing longer than two hours" is a single sentence that an AI model can parse into filters and return results that match the actual intent, not just keywords. The UI still matters, you want to see posters, read descriptions, watch trailers but the main interaction becomes conversational.

MCP Apps are a way to this possible today. Here's how.

What MCP Apps Are

The Model Context Protocol (MCP) already lets AI models call external tools, fetch data, run queries, trigger actions. MCP Apps extend that with a UI layer. When a tool returns results, instead of dumping raw text, it can render a widget directly in the chat interface.

A framework that makes this practical is mcp-use from Manufact, a full-stack TypeScript SDK for building MCP servers with interactive widgets. It provides the server runtime, the React hooks for widgets, the build toolchain, and crucially, dual-protocol support: widgets built with mcp-use work across both MCP Apps clients (Claude, Goose) and ChatGPT through automatic metadata transformation. Write once, deploy everywhere.

The mental model: the AI decides what to show, the widget decides how to show it.

A tool call returns both structured data for the model to reason about and a widget with props that renders an interactive interface. mcp-use enforces a clean data separation here, the model sees only the text summary (via output), while the widget receives structured props (via structuredContent) that never inflate the model's context window. The model stays in the loop for decisions that need intelligence ("find me something similar to this movie"), while the widget handles interactions that don't need AI ("click this card to see details").

For streaming, this means the AI handles the understanding of "what you're in the mood for" and the widget handles the familiar part showing you a interactive catalog you can browse, click, and play from.

TV MCP APP

To demonstrate this experience end-to-end, I've built a working MCP App you can run locally or connect to any MCP-compatible client today. The full source is open source at github.com/giolaq/tv-mcp-app clone it, wire it to Claude Desktop or your MCP app compatibile LLM, and you have a conversational streaming interface running in minutes. The rest of this article walks through how it's built.

The Architecture: Server + Widget

The MCP App built with mcp-use has two parts:

Server (index.ts) — Uses MCPServer from mcp-use/server to define tools the AI model can call. Each tool does backend work (fetch data, filter, score) and returns responses via mcp-use's text() and widget() helpers.

Widget (resources/tv-streaming/widget.tsx) — A React component wired to the server via mcp-use's React hooks (useWidget, useCallTool, McpUseProvider from mcp-use/react). It receives props from tool calls and renders the UI. It can also call tools directly for instant interactions, or send messages back to the AI via sendFollowUpMessage.

Here's the project layout:

tv-mcp-app/
├── index.ts                          # Server: 7 tools, catalog logic
├── resources/tv-streaming/
│   └── widget.tsx                    # Widget entry point (mcp-use convention)
├── src/
│   ├── types.ts                      # Shared TypeScript interfaces
│   ├── hooks/                        # useCatalog, useKeyboardNav, useIntersectionPause
│   └── components/                   # HeroBanner, ContentRow, MovieCard, etc.
├── package.json
└── tsconfig.json
Enter fullscreen mode Exit fullscreen mode

mcp-use uses a convention-based widget system. Widgets live under resources/ — either as single files (resources/weather-display.tsx) or as folders for complex UIs. This project uses the folder pattern: resources/tv-streaming/widget.tsx is the required entry point, with supporting components and hooks in src/.

This separation mirrors how the experience feels to the user. The server is the brain it understands the catalog, filters it, scores recommendations. The widget is the screen, it renders the interface the user actually interacts with. The AI model is the remote control except instead of up/down/select, it understands "something like that but scarier."

Setting Up the Server with mcp-use

The server initializes with MCPServer from mcp-use/server:

import {MCPServer, text, widget} from 'mcp-use/server';

const server = new MCPServer({
  name: 'tv-streaming',
  title: 'TV Streaming',
  version: '1.0.0',
  description:
    'A TV streaming assistant that helps users discover, filter, recommend, and play content.',
  host: process.env.HOST ?? '0.0.0.0',
  baseUrl: process.env.MCP_URL ?? `http://localhost:${port}`,
});

// At the bottom of the file:
await server.listen(port);
Enter fullscreen mode Exit fullscreen mode

MCPServer handles MCP protocol negotiation, widget asset serving, session management, and the double-iframe sandbox architecture that mcp-use uses for security isolation. The baseUrl tells the server where widgets will be served from.

Defining Tools: Two Categories

This project defines 7 tools split into two categories that serve fundamentally different purposes.

Model-Visible Tools (5)

These are tools the AI model calls in response to user messages. Each one uses mcp-use's server.tool() to define a name, Zod schema, widget binding, and handler function.

For instance discover_content is the main browsing tool:

server.tool(
  {
    name: 'discover_content',
    description:
      'Browse and filter the TV streaming catalog. Opens the TV widget with filtered results.',
    schema: z.object({
      query: z
        .string()
        .optional()
        .describe('Free-text search across titles and descriptions'),
      genres: z.array(z.string()).optional().describe('Filter by genre(s)'),
      category: z.string().optional().describe('Filter by category'),
      min_rating: z
        .number()
        .min(1)
        .max(5)
        .optional()
        .describe('Minimum star rating'),
      trending_only: z
        .boolean()
        .optional()
        .describe('Only show trending titles'),
      sort_by: z.enum(['rating', 'year', 'title']).optional(),
      limit: z.number().min(1).max(30).optional(),
    }),
    widget: {
      name: 'tv-streaming',
      invoking: 'Searching catalog...',
      invoked: 'Results ready',
    },
  },
  async ({
    query,
    genres,
    category,
    min_rating,
    trending_only,
    sort_by,
    limit,
  }) => {
    const catalog = await getCatalog();
    let results = [...catalog.items];

    // Apply filters...
    if (genres?.length) {
      const genresLower = genres.map((g) => g.toLowerCase());
      results = results.filter((i) =>
        i.genres?.some((g) => genresLower.includes(g.toLowerCase())),
      );
    }
    // ... more filtering and sorting ...

    return widget({
      props: {
        viewMode: 'browse',
        catalog: JSON.stringify(filteredCatalog),
        filters: JSON.stringify(activeFilters),
      },
      output: text(summary),
    });
  },
);
Enter fullscreen mode Exit fullscreen mode

Three things to note about the mcp-use patterns at work:

  1. The schema uses Zod (which mcp-use uses throughout for validation). The AI model reads these parameter descriptions to decide which fields to fill. When a user says "show me comedies," the model maps that to { genres: ["Comedy"] }. When they say "top-rated sci-fi from the last 3 years," it maps to { genres: ["Sci-Fi"], min_rating: 4, year_from: 2023, sort_by: "rating" }.

  2. The widget config binds the tool to a named widget ("tv-streaming", matching the folder under resources/). The invoking and invoked strings are status messages displayed to the user during tool execution.

  3. mcp-use's widget() helper enforces data separation. The props object becomes structuredContent invisible to the model but available to the widget via useWidget().props. The output text (wrapped in mcp-use's text() helper) becomes content visible to the model for reasoning. The model sees "Found 8 comedies, here are the top 5..." as context for the conversation, while the widget receives the full filtered catalog for rendering.

All the model-visible tools follow the same pattern:

Tool Purpose
discover_content Browse/filter/search the catalog
get_title_details Show detail overlay for a specific title
play_title Start video playback
get_recommendations Find similar titles by scoring
get_catalog_overview Return catalog stats as text (no widget)

Together, these five tools cover the full user journey from "what to watch?" to watching. The AI orchestrates the flow and the user just talks.

App-Only Tools (2)

Some interactions don't need AI at all. When you click a movie poster, you want the detail overlay now not after a 2-second round-trip to a language model.

mcp-use supports this with useCallTool, a React hook that lets the widget invoke server tools directly, bypassing the model entirely:

server.tool(
  {
    name: 'widget_show_details',
    description:
      'App-only: Widget calls this when a user clicks a card to show the detail overlay.',
    schema: z.object({
      id: z.string().describe('The title ID to show details for'),
    }),
    widget: {
      name: 'tv-streaming',
      invoking: 'Loading details...',
      invoked: 'Details ready',
    },
  },
  async ({id}) => {
    const catalog = await getCatalog();
    const item = catalog.items.find((i) => i.id === id);
    return widget({
      props: {
        viewMode: 'details',
        detailItemId: item.id,
        catalog: JSON.stringify({...catalog, items: [item]}),
      },
      output: text(`Showing details for: ${item.title}`),
    });
  },
);
Enter fullscreen mode Exit fullscreen mode

On the widget side, useCallTool from mcp-use/react creates a typed handle:

const showDetailsToolCall = useCallTool<{id: string}>('widget_show_details');

// When a card is clicked:
showDetailsToolCall.callTool({id: item.id}); // Instant. No AI round-trip.
Enter fullscreen mode Exit fullscreen mode

The hook provides isPending, isSuccess, isError, and data states, mcp-use also auto-generates TypeScript types from your tool schemas during mcp-use dev, so you get full IntelliSense on tool names and parameters.

The Widget: React in the Chat

The widget is a React component, wrapped with mcp-use's McpUseProvider and wired to the server via hooks. It looks and feels like a native streaming app with dark theme, hero banners, horizontal scroll rows, poster cards with hover effects, a video player with custom controls. The difference is where it lives: inside an AI conversation.

The widget's design is directly inspired by my work at Amazon's react-native-multi-tv-app-sample a production-ready template for building cross-platform TV apps with React Native across Android TV, Apple TV, Fire TV, Vega OS and web. I've borrowed the same visual language: a dynamic hero header that updates based on focused content, horizontal content rows with poster cards, a detail overlay with metadata and action buttons, and a custom video player with spatial-navigation-friendly controls.

Three Interaction Patterns

The widget uses three distinct patterns for different types of user actions.

Pattern 1: App-only tool call via useCallTool (instant, no AI)

When a user clicks a movie card:

const handleSelectItem = useCallback(
  (item: CatalogItem) => {
    setSelectedItem(item);
    setView('details');
    showDetailsToolCall.callTool({id: item.id}); // No AI round-trip
  },
  [showDetailsToolCall],
);
Enter fullscreen mode Exit fullscreen mode

The UI updates immediately. This is indistinguishable from clicking a card on Netflix. The callTool fires a request to the server, which returns fresh data as widget props, but the model is never involved. mcp-use routes the call directly through its JSON-RPC postMessage bridge.

Pattern 2: Follow-up message via sendFollowUpMessage (needs AI intelligence)

When a user clicks "More Like This" in the detail overlay:

const handleMoreLikeThis = useCallback(
  (item: CatalogItem) => {
    sendFollowUpMessage(
      `Show me titles similar to "${item.title}" (ID: ${item.id})`,
    );
  },
  [sendFollowUpMessage],
);
Enter fullscreen mode Exit fullscreen mode

This sends a message as if the user typed it. The AI model receives it, decides to call get_recommendations, the server scores candidates, and the widget re-renders with new results. The AI stays in the loop because "similar" is a judgment call (genre overlap, tone, rating proximity) and the model can weigh these factors in context. Maybe the user previously said "nothing too dark" and the model remembers that when filtering recommendations.

Pattern 3: Pure client-side (no server at all)

Search filtering, keyboard navigation, view transitions — these happen entirely in the widget with React state:

const [searchQuery, setSearchQuery] = useState('');
const rows = useMemo(
  () => filterRows(allRows, searchQuery),
  [allRows, searchQuery],
);
Enter fullscreen mode Exit fullscreen mode

No tool call needed. The data is already in the widget; just filter it locally.

Server-Side Patterns Worth Stealing

Catalog Caching with TTL

The server fetches from a remote JSON feed and caches for 5 minutes:

let cachedCatalog: Catalog | null = null;
let cacheTime = 0;
const CACHE_TTL = 5 * 60 * 1000;

async function getCatalog(): Promise<Catalog> {
  const now = Date.now();
  if (cachedCatalog && now - cacheTime < CACHE_TTL) {
    return cachedCatalog;
  }
  const res = await fetch(CATALOG_URL);
  cachedCatalog = (await res.json()) as Catalog;
  cacheTime = now;
  return cachedCatalog;
}
Enter fullscreen mode Exit fullscreen mode

Every tool call goes through getCatalog(). The first call fetches; subsequent calls within 5 minutes use cache.

Running It

The project uses mcp-use's CLI for development and builds:

{
  "scripts": {
    "build": "NODE_OPTIONS='--max-old-space-size=8192' mcp-use build",
    "dev": "mcp-use dev",
    "start": "mcp-use start"
  },
  "dependencies": {
    "mcp-use": "^1.19.2-canary.2",
    "react": "^19.2.0",
    "react-dom": "^19.2.0",
    "zod": "4.3.5"
  }
}
Enter fullscreen mode Exit fullscreen mode
npm install
npm run dev
Enter fullscreen mode Exit fullscreen mode

mcp-use dev starts the server with hot reload, auto-generates TypeScript types from your tool schemas, and serves an inspector at /inspector for testing tools and widget rendering across protocols.

For public access through a tunnel:

npm run dev -- --tunnel
Enter fullscreen mode Exit fullscreen mode

mcp-use creates a persistent subdomain like https://<subdomain>.local.mcp-use.run/mcp. For production, you can deploy to Manufact Cloud with npx mcp-use deploy.

To connect it to Claude Desktop or ChatGPT, add the MCP server URL to the host's config.

What This Means for Streaming

An AI agent flips the usual UX of TV Streaming apps model. The input is natural language. The output is still a rich visual interface, you don't lose the posters, the trailers, the browse-and-discover experience that makes streaming fun.

The TV app of the future probably won't be an app you open. It'll be a widget that appears when you say what you're in the mood for.

Top comments (0)